This lesson is in the early stages of development (Alpha version)

Remote Repositories

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • How do I work with a remote repository?

Objectives
  • Add an SSH key to a GitHub account

  • Understand git push and git pull

We’ve learned how to use a local repository to store our code and view changes:

Local Repository Commands

Now, however, we’d like to share the changes we’ve made to our code with others, as well as making sure we have an off-site backup in case things go wrong. We need to upload our changes in our local repository to a remote repository.

Why Have an Off-site Backup?

You might wonder why having an off-site backup (i.e. a copy not stored at your University) is so important. In 2005, a fire destroyed a building at the University of Southampton. Some people’s entire PhD projects were wiped out in the blaze. To ensure your PhD only involves a normal level of suffering, please make sure you have off-site backups of as much of your work as possible!

To do that, we’ll use the remote repository we set up on GitHub at the start of the workshop. It’s another repository, just like the local repository on the DIRAC server, that Git makes it easy to send and receive data from. Multiple local repositories can connect to the same remote repository, allowing you to collaborate with colleagues easily.

Remote Repositories

So we’re finally going to address all those “Your branch is ahead of ‘origin/main’ by 3 commits” messages we got from git status! However, GitHub doesn’t let just anyone push to your repository - you need to prove you’re the owner (or have been given access). Fortunately, we already set up an SSH key earlier.

Now we can synchronise our code to the remote repository, with git push:

$ git push
warning: push.default is unset; its implicit value is changing in
Git 2.0 from 'matching' to 'simple'. To squelch this message
and maintain the current behavior after the default changes, use:

  git config --global push.default matching

To squelch this message and adopt the new behavior now, use:

  git config --global push.default simple

See 'git help config' and search for 'push.default' for further information.
(the 'simple' mode was introduced in Git 1.7.11. Use the similar mode
'current' instead of 'simple' if you sometimes use older versions of Git)

Counting objects: 11, done.
Delta compression using up to 32 threads.
Compressing objects: 100% (9/9), done.
Writing objects: 100% (9/9), 1.11 KiB | 0 bytes/s, done.
Total 9 (delta 2), reused 0 (delta 0)
remote: Resolving deltas: 100% (2/2), completed with 1 local object.
To git@github.com:smangham/climate-analysis
   70bf8f3..501e88f  main -> main

And we’re done! This bit was easy as when we used git clone earlier, it set up our local repository to track the remote repository. The main -> main line shows we’re sending our local branch called main to the remote repository as a branch called main.

You’ll notice that, as this is an old version of Git, we’ve been given a warning - by default, old Git pushes all branches when you do git push, whilst newer versions only push your current branch. Whilst we don’t use branches in this material, let’s adopt the modern standard anyway just to remove the notification:

$ git config --global push.default simple

What is a Branch, Though?

We’re not covering them in this material, but they’re very useful. Branches allow you to have alternate versions of the code ‘branching off’ from another branch (e.g. main). You can try out new features in these branches without disrupting your main version of the code, then merge them in once you’ve finished. We have a Stretch Episode with a brief description of them.

If we go back to the repository on GitHub, we can refresh the page and see our updates to the code:

Updated remote repository

Conveniently, the contents of README.md are shown on the main page, with formatting. You can also add links, tables and more. Your code should always have a descriptive README.md file, so anyone visiting the repo can easily get started with it.

How often should I push?

Every day. You can never predict when your hard disk will fail or your building will be destroyed! In case of fire, git commit, git push, leave building Credit: Mitch Altman, CC BY-SA 2.0

Collaborating on a Remote Repository

Now we know how to push our work from our local repository to a remote one, we need to know the reverse - how to pull updates to the code that someone else has made.

We want to invite other people to collaborate on our code, so we’ll update the README.md with a request for potential collaborators to email us at our University email address.

nano README.md
cat README.md
# Climate Analysis Toolkit

This is a set of python scripts designed to analyse climate datafiles.

If you're interested in collaborating, email me at s.w.mangham@soton.ac.uk.
git commit -am "Added collaboration info"
[main 39a2c8f] Added collaboration info
 1 file changed, 2 insertions(+)

In this case, we use git commit -am where the -a means commit all modified files we’ve previously used git add on, and the -m bit means ‘and here’s the commit message’ as usual. It’s a handy shortcut.

But don’t push to GitHub just yet! We’re going to set up a small conflict, of the kind you might see when working with a remote repository.

Now, pretending to be an existing collaborator, we’ll go and add those installation instructions by editing our README.md file directly on GitHub. This isn’t common, but if you want to quickly make some small changes to a single file it can be useful. We edit it as:

GitHub edit button

And just expand it a little, making more use of GitHub’s markdown formatting:

GitHub editing Readme

Then commit the changes directly to our main branch with a descriptive commit message:

GitHub committing edit

Updated remote repository

Push Conflicts

Great. Now let’s go back to the terminal and try pushing our local changes to the remote repository. This is going to cause problems, however:

git push
To git@github.com:smangham/climate-analysis
 ! [rejected]        main -> main (fetch first)
error: failed to push some refs to 'git@github.com:smangham/climate-analysis'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first merge the remote changes (e.g.,
hint: 'git pull') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

Git helpfully tells us that actually, there are commits present in the remote repository that we don’t have in our local repository.

Merge Conflicts

We’ll need to pull those commits into our local repository before we can push our own updates back!

git pull
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.
From github.com:smangham/climate-analysis
   501e88f..023f8f6  main       -> origin/main
Auto-merging README.md
CONFLICT (content): Merge conflict in README.md
Automatic merge failed; fix conflicts and then commit the result.

We have created a conflict! Both us, and our remote collaborator, both edited README.md. Let’s take a look at the file:

cat README.md
# Climate Analysis Toolkit

This is a set of python scripts designed to analyse climate datafiles.

<<<<<<< HEAD
If you're interested in collaborating, email me at s.w.mangham@soton.ac.uk.
=======
To install a copy of the toolkit, open a terminal and run:

    git clone git@github.com:smangham/climate-analysis.git


**This code is currently in development and not all features will work**
>>>>>>> 493dd81b5d5b34211ccff4b5d0daf8efb3147755

Git has tried to auto-merge the files, but unfortunately failed. It can handle most conflicts by itself, but if two commits edit the exact same part of a file it will need you to help it.

We can see the two different edits we made to the end of the README.md file, in a block defined by <<<, === and >>>. The top block is labelled HEAD (the changes in our latest local commit), whilst the bottom block is labelled with the commit ID of the commit we made on GitHub.

We can easily fix this using nano, by deleting all the markers and keeping the text we want:

nano README.md
cat README.md
# Climate Analysis Toolkit

This is a set of python scripts designed to analyse climate datafiles.

If you're interested in collaborating, email me at s.w.mangham@soton.ac.uk.

To install a copy of the toolkit, open a terminal and run:

   git clone git@github.com:smangham/climate-analysis.git


**This code is currently in development and not all features will work**

Now we’ve got a fixed and finished README.md file, we can commit our changes, and push them up to our remote repository:

git commit -am "Fixed merge conflict"
[main 6f4df16] Fixed merge conflict
git push
Counting objects: 10, done.
Delta compression using up to 32 threads.
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 774 bytes | 0 bytes/s, done.
Total 6 (delta 2), reused 0 (delta 0)
remote: Resolving deltas: 100% (2/2), completed with 1 local object.
To git@github.com:smangham/climate-analysis
   023f8f6..09f5151  main -> main

Now back on GitHub we can see that our README.md shows the text from both commits, and our conflict is resolved:

Resolved conflict on GitHub

Now we can successfully collaboratively develop our research code with others.

Conflict Mitigation

If you’ve got multiple different people working on a code at once, then the branches we mentioned earlier can really help reduce conflicts. Each collaborator can work on their own branch, and only merge them back in once everything is finished - dramatically reducing the number of conflicts!

Remote Repository Commands

Key Points

  • Git can easily synchronise your local repository with a remote one

  • GitHub needs an SSH key to allow access