Understanding Git — Branching
This is the second post in my Understanding Git series so be sure to check out the first post that deals with git’s data model before you start with this one.
Let’s start where we left off last time — at git’s data model. Only this time we will simplify it a bit by only displaying the commit objects and giving them some symbolic names instead of checksums (just to make it easier to follow), so we get a graph like this:
Those familiar with the graph theory will notice that this is a Directed Acyclic Graph (DAG). What that means is that the connection edges between graph nodes (in git’s case commits) are directed and if you start from one node traveling through the graph and following the direction of the edges you can never come to the same node that you started off (there are no “round-trips” ).
It is pretty much intuitive that we can differ three branches on our example graph. We’ll mark them as red (containing commits A,B,C,D,E), blue (containing commits A,B, F ,G) and green (containing commits A,B,H,I,J).
So that’s one way of defining a branch — to associate it with a list of commits it contains. However, this is not the way git does it. Git uses a simpler and cheaper solution. Instead of having a list of all the commits belonging to a branch and keeping it updated, git only keeps track of the last commit on a branch. By knowing the last commit of a branch it is quite trivial to reconstruct the whole commits list of that branch just by following the directed edges of the git commit-graph. For example, to define our blue branch, we only need to know that the last commit on the blue branch is G and from there if we need a list of all commits the blue branch contains we can just follow the directed graph edges starting from G.
And this is how git manages branches, by keeping a pointer to commits. So let’s see it “in action”.
First, we will initialize an empty repository
git init
and take a look at .git
directory
$ tree .git/.git/
├── HEAD
├── config
├── description
├── hooks
│ ├── applypatch-msg.sample
│ ├── commit-msg.sample
│ ├── post-update.sample
│ ├── pre-applypatch.sample
│ ├── pre-commit.sample
│ ├── pre-push.sample
│ ├── pre-rebase.sample
│ ├── pre-receive.sample
│ ├── prepare-commit-msg.sample
│ └── update.sample
├── info
│ └── exclude
├── objects
│ ├── info
│ └── pack
└── refs
├── heads
└── tags
This time we will focus on the refs
sub-directory. It stands for references and this is where git keeps the branch pointers.
Since we didn’t commit any changes yet, refs
directory is empty, so we will create and commit a few files.
echo "Hello World" > helloEarth.txt
git add .
git commit -m "Hello World Commit" echo "Hello Mars" > helloMars.txt
git add .
git commit -m "Hello Mars Commit" echo "Hello Saturn" > helloSaturn.txt
git add .
git commit -m "Hello Saturn Commit"
If we do git branch
now we see this output
* master
meaning we are now on the master branch (that git created automatically upon our first commit).
If we take another look at .git/refs
└── refs
├── heads
│ └── master
└── tags
we see there is a file in refs/heads
sub-directory and it is named master
just as our branch is. This is a text file so we can use cat
to take a look at it
cat .git/refs/heads/master
and we see it contains a checksum
c641e4f0d19df0570667977edff860fed8f6c05a
and if we do
git log
we see it is the checksum of our last commit:
commit c641e4f0d19df0570667977edff860fed8f6c05a (HEAD -> master)
Author: zspajich <zspajich@gmail.com>
Date: Mon Feb 12 16:28:44 2018 +0100Hello Saturn Commit
(Note: checksums will have different values on your computer)
So there we have it — a branch in git is just a text file containing a checksum of the last commit on that branch. In other words — a pointer to a commit.
If we now create and checkout a new feature
branch
git checkout -b feature
and take another look at .git/refs
tree .git/refs
sure we see another file called feature
└── refs
├── heads
│ ├── feature
│ └── master
and if we take a look at its checksum (pointer)
cat .git/refs/heads/feature
we see it’s the same as in the master
file (branch)
c641e4f0d19df0570667977edff860fed8f6c05a
since we didn’t do any new commits on that branch.
So that’s how fast and cheap creating a new branch in git is. Git just creates a text file and fills it with the checksum of the current commit.
But now that we have two branches there is one question. How does git know which of these two branches we are currently checked on? Well, there is one more special pointer (whose name will probably sound familiar to you) called HEAD
. It is special because it (usually) doesn’t point to a commit object, but to a ref (branch) and git uses it to track which branch is currently checked out.
If we look inside HEAD
cat .git/HEAD
we see it currently points to the feature
ref file (branch).
ref: refs/heads/feature
If we would do
git checkout master
and take a look at HEAD
cat .git/HEAD
we would see
refs: refs/heads/master
it would point to the master branch.
So that‘s git’s branch model. It is very simple but important to know in order to understand many git operations that operate on that graph (merge, rebase, checkout, revert …).