This is the first post of the Gardening with Git series. The goal of the series is to build a model for Git and to provide insight about conventions and practices which allow to use Git not only for source version control, but also for knowledge sharing and communication. While the conventions and practices should make Git a wonderful ally in the operation of a software craftspeople team, we want the model to be good enough to predict the outcome of the most common git commands.
In this post, we’ll first build a tree representation of Git repositories, then break it to learn from its limitations, and finally fix it to make it the first piece of our model. As an outcome, this journey will introduce the first of the most important Git concepts: context.
Towards a tree representation of Git repositories
A commit is something that holds some data (let’s call it D), and has a reference to a parent commit. Let’s place the parent commit below the current one.
Let’s now draw that parent commit and call its data C, then it’s own parent (with B inside maybe?), and keep iterating for a while.
We dont care really about the commits’ content for now, so, before going further, let’s simplify the schema by removing the D, C and B names.
Two commit can have the same parent. And have children commits on their own. Let’s represent such a case.
Since we have a convention about writing children commits above their parents, let’s simplify the schema again by removing the arrows so we can draw larger respositories (that’s just a fancy name for a collection of commits). The relation between children and their parents is now implicit but the overall schema is more readable.
With that modification the respository looks a lot like a tree.
If there is a tree, there must be some branches! There are, indeed. But before we continue, if we’ll be talking about branches, we should give them names so we know what we’re talking about. Let’s do that by sticking a label on top of each of them.
I said branch labels were stuck to the top of the branch. That’s to say while new commits are added, the label always remains on the newest one.
Now that we have a tree representation of Git repositories, let’s make sure it fits our requirements!
Validating the metaphor and finding its limits
The whole point of creating a tree representation of our Git repositories was being able to predict the outcome of the Git commands we use (if we can’t predict their outcome, we can’t decide which commands to use). In this section we’ll validate the metaphor against a few examples and check to which extent it can be used.
git rebase outcome
Let’s start with the
git rebase <new base> <branch> command. (It may sound scary , but for our example no prior knowledge is necessary.) Here is our starting point: we had a
master branch with 3 commits, then we created a new branch with three more commits. After the new branch was created, a couple of new commits were added to the
According to our model for a branch, the third commit from the
master branch appears to be the base of the new branch.
Re-basing a branch is about moving it to a new base. Consequently, the expected outcome of rebasing the new branch on top of
master is the sequence of commits: C, B, A, 5, 4, 3, 2, 1.
Let’s perform the
git rebase command and compare its outcome to our expectations.
$ git log --oneline --deco master d7af2c4 (master) 5 cf2c618 4 c76f446 3 b8ea8d3 2 3e57ebf 1 $ git log --oneline --deco add-user-profile a2d5ba1 (add-user-profile) C e3412d7 B 7535517 A c76f446 3 b8ea8d3 2 3e57ebf 1 $ git rebase master add-user-profile $ git log --oneline --deco add-user-profile 4f59d31 (add-user-profile) C 59e701c B 4be1d30 A d7af2c4 (master) 5 cf2c618 4 c76f446 3 b8ea8d3 2 3e57ebf 1
The command outcome does match our expectations, yay!
Success! The tree representation we built allowed us to predict the outcome of a
git rebase operation. Let’s now take a closer look at the
git log command…
Breaking the model: the
git log outcome
Let’s go back a little bit and start from the same context than we did before. Remember: we had a
master branch with 3 commits, then we created a new branch with three more commits called
add-users-profile. After the new branch was created, a couple of new commits were added to the
git log command displays all the commits that compose a given branch . According to our model, the output that we can reasonably expect from a
git log command for the
add-user-profile branch is the sequence of commits C, B, A.
Let’s perform the
git log command and compare its outcome to our expectations.
$ git log --oneline --deco add-user-profile a2d5ba1 (add-user-profile) C e3412d7 B 7535517 A c76f446 3 b8ea8d3 2 3e57ebf 1
Oh, our expectations were not fulfilled and the sequence of commits we got was unexpected. In fact the C, B, A commits were displayed, but so did the 3, 2 and 1 commits.
This time, our tree representation didn’t reflect the way Git defines branches. While our tree model suggests that a branch starts at the branch base,
git log displays all the commits from the top of the branch down to the initial commit.
Let’s acknowledge that fact and draw our entire repository using that branch definition.
That new repository representation doesn’t look much like a tree anymore, but it matches closely the
git log behaviour, and would allow us to predict the command outcome accurately.
So, what’s the balance? On the first hand, our tree representation was really useful to predict the
git rebase outcome, but it was quite missleading when dealing with
git log. On the other hand, the repository representation which does match the
git log behaviour is really simple, but it doesn’t seem to provide any clue about how the
git rebase command may behave and doesn’t tell much about several commits having the same parent.
Interpreting the metaphor limitations
Fixing the tree representation: domains of validity
Let’s create an alternative tree representation that we can use it to predict the
git log command outcome.
Pretty neat, isn’t it? It is a tree, no doubt about that, and also a good complement to tree representation which we already use to predict the
git rebase behaviour.
Now, we have two trees: a skinny tree and a fat tree, which both model properly the Git behaviour of different kinds of commands. Understanding the difference between both kinds of commands is the key to determine which representation we should use to predict the output of any given Git command.
What makes the
git rebase and
git log commands different? The former deals with branches in a collective manner, while the latter does treat them individually. You only need one branch to use
git log, but you need at least two of them to perform a
git rebase operation.
|Operations dealing with branches collectively||Operations on individual branches|
It happens the skinny tree representation is a valid model for any Git command that deals with branches collectivellly. And the fat tree representation is a valid one for any Git command that operates on individual branches .
Notion of context
So what’s the big difference between considering the Git branches individually or collectively? A branch, as the
git log command shows, is composed by all the commits from its very top, down to the repository’s initial commit. It is a whole single sequence of commits. That’s why, when considered individually, the fat tree representation is the one that works best.
When you add a second branch to the scenario (let’s call it the reference branch), something amazing happens. Instead of getting two distinct sequences of commits, you get three of them:
- all the commits which are unique to the first branch
- all the commits which are unique to the second branch (the reference branch)
- all the commits that both branches have in common
Often when you consider a branch among other branches you are interested in what is unique about that branch. However, what is unique about a branch is not enough to define it entirely. A person is not only what makes her unique, but also everything she shares with others: her culture. A branch is not only what makes it unique, but also what it shares with other branches: its context.
And exactly as with people, you won’t get the best from what makes a branch unique unless you understand in which context it is evolving.
Because of the importance of being able to identify the context when you’re working with branches, let’s take a few example to stress the following point: the context depends on both (all) the branches which are involved in the scenario. Therefore, there is no such thing as a branch context per se and the context will be different if you compare your branch to one branch or another. As we can observe, the skinny tree representation is particulary fitted to reason about the context-dependent scenarios.
Let’s hold that thought: depending on the reference branch we choose, what we call the context and what we call our feature will change slightly.
That idea of a moving context, which changes depending on our goals as for the branch opens the way to a generalization of the context concept to a single branch: once that we identified the commits we’re focusing on (which we define as being the feature), each and every single commit down to the initial commit are that feature’s context.
- the context depends on both (all) the branches which are involved in the scenario (so there is no such thing as a branch context per-se, and the context will be different if you compare your branch to one branch or another)
- the concept of context can be generalized to a single branch: if you’re interested in a given set of commits from a branch (let’s call that sequence the feature), all the commits from that branch down the parent relationship are the context in which your feature is defined
Identifying the current context when working with branches is something that you’ll do often and which becomes easier with practice, so don’t worry too much about that! Also, we’ll see more examples in the next posts of this series.
Let’s wrap up. We’ve built a tree representation of a Git repository, which allowed us to determine the outcome of some Git commands. Then we streched it to find its limitations and we worked around them.
These limitations were caused by a fundamental difference between operations which involve several branches, for which we were able to define a concept of context, and those which involve a single branch, and therefore are free of any context.
With that in mind, the alternative tree representation we built to work around the so-called skinny tree representation limitations made sense in a more general way, beyond its initial workaround status. That’s why we’ll talk of a tree representation in a general way from now on, understanding that one of the two alternative representations does apply, depending on the Git commands we’re working with.
Finally, we were able to put together a generalization of the context idea for isolated branches, which we’ll find useful when talking about context adaptations.
In the next post of this series, the concept of context will allow us to understand different merging scenarios, and drive us toward our first branch management conventions. Stay tuned!
-  Because the
git rebasecommand is potentially destructive it SHOULD sound scary.
-  The
--onelineoption will ensure each commit is displayed in a single line, and is useful to save some space. The
--decorateoption will print any branch sticky label that could be associated with the displayed commits. We'll use both of them.
-  That's to say you can predict the behaviour of any command that deals with branches collectively using the skinny tree representation. And you can predict the behaviour of any command that operates on individual branches using the fat tree representation.