Exploring History
Overview
Teaching: 10 min
Exercises: 0 minQuestions
How can I review my changes?
How can I recover old versions of files?
Objectives
Identify and use Git revision numbers.
Compare files with previous versions of themselves.
Restore old versions of files.
Exploring History
We’ve seen that git log
gives us some information on what commits were made when, but let’s look a bit deeper at the specifics:
$ git log
commit f15ad111042cee7492f40ad6ff0ec18588fce753 (HEAD -> main)
Author: Sam Mangham <mangham@gmail.com>
Date: Wed Mar 30 17:15:47 2022 +0100
Add rainfall processing placeholder
commit 6aeaf44173344939e9994d7ccb5512fc5b26c211
Author: Sam Mangham <mangham@gmail.com>
Date: Wed Mar 30 17:14:14 2022 +0100
Add Docstring
commit 503f02b5f51d5622121e204494dfabc9b2ae7410
Author: Sam Mangham <mangham@gmail.com>
Date: Wed Mar 30 17:12:02 2022 +0100
Added a basic readme file
commit 499b6d18b36a25d3f5ab9be1b708ea48fef1dd65 (origin/main, origin/HEAD)
Author: Sam Mangham <mangham@gmail.com>
Date: Wed Mar 16 14:19:13 2022 +0000
Initial commit
We can see commits identified by long IDs, but also HEAD at the top of the log. HEAD is the name used to refer to the most recent end of the chain of commits to our local repository.
Relative History
What if somehow we’ve introduced a bug, and we want to see what’s changed between our latest version of the code and the copy that was working last commit, or a few commits ago? Which lines did we edit, and what did we add?
We can use git diff
again to look for differences between files, but refer to the versions of the files as saved in older commits using the notation HEAD~1
, HEAD~2
, and so on to refer to the commits.
We can refer to previous commits using the ~
notation,
so HEAD~1
(pronounced “head minus one”)
means “the previous commit”,
while HEAD~123
goes back 123 commits from the latest one.
$ git diff HEAD~1 climate_analysis.py
diff --git a/climate_analysis.py b/climate_analysis.py
index d5b442d..c463f71 100644
--- a/climate_analysis.py
+++ b/climate_analysis.py
@@ -26,3 +26,5 @@ for line in climate_data:
kelvin = temp_conversion.fahr_to_kelvin(fahr)
print(str(celsius)+", "+str(kelvin))
+
+# TODO: Add rainfall processing code
So we see the difference between the file as it is now, and as it was the commit before before the latest one.
$ git diff HEAD~2 climate_analysis.py
diff --git a/climate_analysis.py b/climate_analysis.py
index 277d6c7..c463f71 100644
--- a/climate_analysis.py
+++ b/climate_analysis.py
@@ -1,3 +1,4 @@
+""" Climate Analysis Tools """
import sys
import temp_conversion
import signal
@@ -25,3 +26,5 @@ for line in climate_data:
kelvin = temp_conversion.fahr_to_kelvin(fahr)
print(str(celsius)+", "+str(kelvin))
+
+# TODO: Add rainfall processing code
And here we see the state before the last two commits, HEAD minus 2.
Absolute History
What about if we want to compare our version of the code to the version from last month, or from the version we used to make a paper last year?
Calculating the number of commits just isn’t realistic - it’ll change all the time as we keep working on the code.
Instead, we can refer to specific revisions using those long strings of digits and letters that git log
displays.
These are unique IDs for the changes, and “unique” really does mean unique: every change to any set of files on any machine has a unique 40-character identifier. (A SHA-1 hash of the new, post-commit state of the repository).
If we scroll down to the bottom of the git log
output, we can see the ID for our first commit.
In the example above, it’s 499b6d18b36a25d3f5ab9be1b708ea48fef1dd65
(but yours will be different!).
However, 40 characters just isn’t very practical to type out,
so you can use however much of an ID you need to pick out a single, unique commit.
By default, Git suggests you use the first seven characters.
Exploring Absolute History
Using unique commit IDs, get a summary of all the changes you’ve made to the
climate_analysis.py
file since the initial commit.Solution
We can use
git log
to get a list of all of our commits we’ve made:$ git log
[snipped for space] commit 499b6d18b36a25d3f5ab9be1b708ea48fef1dd65 (origin/main, origin/HEAD) Author: Sam Mangham <mangham@gmail.com> Date: Wed Mar 16 14:19:13 2022 +0000 Initial commit
Then, we can take the first 7 characters of the commit ID of the initial commit, and use them with
git diff
:$ git diff 499b6d1 climate_analysis.py
diff --git a/climate_analysis.py b/climate_analysis.py index 277d6c7..6f8ed8a 100644 --- a/climate_analysis.py +++ b/climate_analysis.py @@ -1,3 +1,4 @@ +""" Climate Analysis Tools """ import sys import temp_conversion import signal @@ -25,3 +26,5 @@ for line in climate_data: kelvin = temp_conversion.fahr_to_kelvin(fahr) print(str(celsius)+", "+str(kelvin)) + +# TODO: Add rainfall processing code
Being able to reference specific commits absolutely is particularly handy, as it lets you exactly identify specific versions of the code. For example, you can identify the version of the code you used to write your first paper, and the different, newer version you used to write your second paper.
Other Ways To Reference Commits
Newer versions of Git have some more advanced ways of referencing past commits. In place of
HEAD~1
you can useHEAD~
orHEAD@{1}
, or you can even use text to ask more advanced questions, likegit diff HEAD@{"yesterday"}
orgit diff HEAD@{"3 months ago"}
!You can also ‘tag’ a commit with a name, using
git tag
to create tags with IDs and descriptions. If you’ve got a version of the code you’ve used in a paper, for example, tagging it is a good idea:$ git tag -a v1.0 -m "Version 1.0, used in paper Mangham2024."
Restoring Files
All right: we can save changes to files and see what we’ve changed — suppose we need to restore older versions of things?
Let’s suppose we accidentally overwrite or delete our file:
$ rm climate_analysis.py
$ ls
README.md
temp_conversion.py
Whoops!
git status
now tells us that the file has been changed,
but those changes haven’t been staged:
$ git status
# On branch main
# Your branch is ahead of 'origin/main' by 3 commits.
# (use "git push" to publish your local commits)
#
# Changes not staged for commit:
# (use "git add/rm <file>..." to update what will be committed)
# (use "git restore <file>..." to discard changes in working directory)
#
# deleted: climate_analysis.py
#
no changes added to commit (use "git add" and/or "git commit -a")
Following the helpful hint in that output, we can put things back the way they were
by using git restore
:
$ git restore climate_analysis.py
$ cat climate_analysis.py
[SNIPPED - but changes rolled back]
By default, restore
replaces the file with the version of it in the staging area. If you haven’t used git add
, that should be the same as the version in the last commit. But what if we already used git add
on our incorrect version of a file, or we broke the file more than one commit ago?
We can use git checkout
, e.g.:
$ git checkout <HEAD or commit ID> climate_analysis.py
Compatibility Notice
Older versions of Git don’t include the
git restore
command - fortunately, it’s just a shortcut forgit checkout --
. Ifgit restore
doesn’t work, trygit checkout -- temp_conversion.py
.checkout
has a lot of functions, and newer versions of Git simplify things by giving them new names.
Double Whoops
What if you accidentally did
git rm climate_analysis.py
? That command tells Git to delete the file and remove it from the repository - so it will record that the file has been deleted, then stop tracking further changes. Even if you re-make the file, it won’t be tracked until you usegit add
on it again.The file still exists in the history, though so if you want to undo this you can do
git checkout HEAD climate_analysis.py
, to get the file back and start tracking it again. Since you can retrieve any file that existed in a previous commit, even if you removed it from future ones, this makes it important not to commit files containing passwords or sensitive information! To avoid this, you can use a .gitignore file to prevent you adding sensitive files in the first place. You can fully delete files from a repository’s history with tools like the BFG but it can be high risk.
The fact that files can be reverted one by one tends to change the way people organize their work.
Consider a situation where all your code is in one file, and you fixed a bug in one section but accidentally introduced one elsewhere.
You can’t just roll back to fix one bug without un-fixing the other. However, if each section is in its own file, you can just roll back the section you broke!
Key Points
git diff
displays differences between commits.
git checkout
recovers old versions of files.