Context from Scratch

by  • Nov 5, 2015

Published in Gardening with Git

This is the first post of the Gardening with Git series. The goal of the series is to build a model for Git and to provide insight about conventions and practices which allow to use Git not only for source version control, but also for knowledge sharing and communication. While the conventions and practices should make Git a wonderful ally in the operation of a software craftspeople team, we want the model to be good enough to predict the outcome of the most common git commands.

In this post, we’ll first build a tree representation of Git repositories, then break it to learn from its limitations, and finally fix it to make it the first piece of our model. As an outcome, this journey will introduce the first of the most important Git concepts: context.

Towards a tree representation of Git repositories

Commits

A commit is something that holds some data (let’s call it D), and has a reference to a parent commit. Let’s place the parent commit below the current one. Graphical representation of a commit.

Let’s now draw that parent commit and call its data C, then it’s own parent (with B inside maybe?), and keep iterating for a while. Graphical representation of a commit and its parent commit.

We dont care really about the commits’ content for now, so, before going further, let’s simplify the schema by removing the D, C and B names. Simplified graphical representation of a few commits.

Two commit can have the same parent. And have children commits on their own. Let’s represent such a case. Graphical representation of a commit with two parent commits.

Since we have a convention about writing children commits above their parents, let’s simplify the schema again by removing the arrows so we can draw larger respositories (that’s just a fancy name for a collection of commits). The relation between children and their parents is now implicit but the overall schema is more readable. Graphical representation of a few commits implicitely ordered.

With that modification the respository looks a lot like a tree. Comparison of a collection of commits and a natural tree.

Branches

If there is a tree, there must be some branches! There are, indeed. But before we continue, if we’ll be talking about branches, we should give them names so we know what we’re talking about. Let’s do that by sticking a label on top of each of them. A branched collection of commits with sticky labels on their top.

I said branch labels were stuck to the top of the branch. That’s to say while new commits are added, the label always remains on the newest one. The branch label sticks to its top when commits are added.

Now that we have a tree representation of Git repositories, let’s make sure it fits our requirements!

Validating the metaphor and finding its limits

The whole point of creating a tree representation of our Git repositories was being able to predict the outcome of the Git commands we use (if we can’t predict their outcome, we can’t decide which commands to use). In this section we’ll validate the metaphor against a few examples and check to which extent it can be used.

Predicting the git rebase outcome

Let’s start with the git rebase <new base> <branch> command. (It may sound scary [1], but for our example no prior knowledge is necessary.) Here is our starting point: we had a master branch with 3 commits, then we created a new branch with three more commits. After the new branch was created, a couple of new commits were added to the master branch. The context for the rebase test.

According to our model for a branch, the third commit from the master branch appears to be the base of the new branch. The base of the new branch is the commit where both branches are in contact.

Re-basing a branch is about moving it to a new base. Consequently, the expected outcome of rebasing the new branch on top of master is the sequence of commits: C, B, A, 5, 4, 3, 2, 1. A branch moving from its original base to a new base.

Let’s perform the git rebase command and compare its outcome to our expectations.


$ git log --oneline --deco master
d7af2c4 (master) 5
cf2c618 4
c76f446 3
b8ea8d3 2
3e57ebf 1
$ git log --oneline --deco add-user-profile
a2d5ba1 (add-user-profile) C
e3412d7 B
7535517 A
c76f446 3
b8ea8d3 2
3e57ebf 1
$ git rebase master add-user-profile
$ git log --oneline --deco add-user-profile
4f59d31 (add-user-profile) C
59e701c B
4be1d30 A
d7af2c4 (master) 5
cf2c618 4
c76f446 3
b8ea8d3 2
3e57ebf 1

The command outcome does match our expectations, yay!

Success! The tree representation we built allowed us to predict the outcome of a git rebase operation. Let’s now take a closer look at the git log command…

Breaking the model: the git log outcome

Let’s go back a little bit and start from the same context than we did before. Remember: we had a master branch with 3 commits, then we created a new branch with three more commits called add-users-profile. After the new branch was created, a couple of new commits were added to the master branch. The context for the log test.

The git log command displays all the commits that compose a given branch [2]. According to our model, the output that we can reasonably expect from a git log command for the add-user-profile branch is the sequence of commits C, B, A. A branch commits according to the tree representation.

Let’s perform the git log command and compare its outcome to our expectations.


$ git log --oneline --deco add-user-profile
a2d5ba1 (add-user-profile) C
e3412d7 B
7535517 A
c76f446 3
b8ea8d3 2
3e57ebf 1

Oh, our expectations were not fulfilled and the sequence of commits we got was unexpected. In fact the C, B, A commits were displayed, but so did the 3, 2 and 1 commits.

This time, our tree representation didn’t reflect the way Git defines branches. While our tree model suggests that a branch starts at the branch base, git log displays all the commits from the top of the branch down to the initial commit. Comparison of the branches representations according to `git log` and our tree model.

Let’s acknowledge that fact and draw our entire repository using that branch definition. Comparison of a Git repository representations according to `git log` and our tree model.

That new repository representation doesn’t look much like a tree anymore, but it matches closely the git log behaviour, and would allow us to predict the command outcome accurately.

So, what’s the balance? On the first hand, our tree representation was really useful to predict the git rebase outcome, but it was quite missleading when dealing with git log. On the other hand, the repository representation which does match the git log behaviour is really simple, but it doesn’t seem to provide any clue about how the git rebase command may behave and doesn’t tell much about several commits having the same parent.

Interpreting the metaphor limitations

Fixing the tree representation: domains of validity

Let’s create an alternative tree representation that we can use it to predict the git log command outcome. A fat-tree alternative graphical representation of a git repository.

Pretty neat, isn’t it? It is a tree, no doubt about that, and also a good complement to tree representation which we already use to predict the git rebase behaviour.

Now, we have two trees: a skinny tree and a fat tree, which both model properly the Git behaviour of different kinds of commands. Understanding the difference between both kinds of commands is the key to determine which representation we should use to predict the output of any given Git command. The two tree representations of a git repository used to predict the Git commands outcome.

What makes the git rebase and git log commands different? The former deals with branches in a collective manner, while the latter does treat them individually. You only need one branch to use git log, but you need at least two of them to perform a git rebase operation.

Operations dealing with branches collectively Operations on individual branches
rebase log
merge branch
  checkout

It happens the skinny tree representation is a valid model for any Git command that deals with branches collectivellly. And the fat tree representation is a valid one for any Git command that operates on individual branches [3]. The domains of validity of both tree representations of a git repository used to predict the Git commands outcome.

Notion of context

So what’s the big difference between considering the Git branches individually or collectively? A branch, as the git log command shows, is composed by all the commits from its very top, down to the repository’s initial commit. It is a whole single sequence of commits. That’s why, when considered individually, the fat tree representation is the one that works best.

When you add a second branch to the scenario (let’s call it the reference branch), something amazing happens. Instead of getting two distinct sequences of commits, you get three of them:

  1. all the commits which are unique to the first branch
  2. all the commits which are unique to the second branch (the reference branch)
  3. all the commits that both branches have in common

Often when you consider a branch among other branches you are interested in what is unique about that branch. However, what is unique about a branch is not enough to define it entirely. A person is not only what makes her unique, but also everything she shares with others: her culture. A branch is not only what makes it unique, but also what it shares with other branches: its context.

And exactly as with people, you won’t get the best from what makes a branch unique unless you understand in which context it is evolving. The definition of the context for a feature branch.

Because of the importance of being able to identify the context when you’re working with branches, let’s take a few example to stress the following point: the context depends on both (all) the branches which are involved in the scenario. Therefore, there is no such thing as a branch context per se and the context will be different if you compare your branch to one branch or another. As we can observe, the skinny tree representation is particulary fitted to reason about the context-dependent scenarios. Different contexts for a branch in different scenarios.

Let’s hold that thought: depending on the reference branch we choose, what we call the context and what we call our feature will change slightly.

That idea of a moving context, which changes depending on our goals as for the branch opens the way to a generalization of the context concept to a single branch: once that we identified the commits we’re focusing on (which we define as being the feature), each and every single commit down to the initial commit are that feature’s context. The generalization of the context idea to a single branch.

To recap:

  1. the context depends on both (all) the branches which are involved in the scenario (so there is no such thing as a branch context per-se, and the context will be different if you compare your branch to one branch or another)
  2. the concept of context can be generalized to a single branch: if you’re interested in a given set of commits from a branch (let’s call that sequence the feature), all the commits from that branch down the parent relationship are the context in which your feature is defined

Identifying the current context when working with branches is something that you’ll do often and which becomes easier with practice, so don’t worry too much about that! Also, we’ll see more examples in the next posts of this series.

Conclusion

Let’s wrap up. We’ve built a tree representation of a Git repository, which allowed us to determine the outcome of some Git commands. Then we streched it to find its limitations and we worked around them.

These limitations were caused by a fundamental difference between operations which involve several branches, for which we were able to define a concept of context, and those which involve a single branch, and therefore are free of any context.

With that in mind, the alternative tree representation we built to work around the so-called skinny tree representation limitations made sense in a more general way, beyond its initial workaround status. That’s why we’ll talk of a tree representation in a general way from now on, understanding that one of the two alternative representations does apply, depending on the Git commands we’re working with.

Finally, we were able to put together a generalization of the context idea for isolated branches, which we’ll find useful when talking about context adaptations.

In the next post of this series, the concept of context will allow us to understand different merging scenarios, and drive us toward our first branch management conventions. Stay tuned!

  1. [1] Because the git rebase command is potentially destructive it SHOULD sound scary.
  2. [2] The --oneline option will ensure each commit is displayed in a single line, and is useful to save some space. The --decorate option will print any branch sticky label that could be associated with the displayed commits. We'll use both of them.
  3. [3] That's to say you can predict the behaviour of any command that deals with branches collectively using the skinny tree representation. And you can predict the behaviour of any command that operates on individual branches using the fat tree representation.

Posted with tags: Recurse Center, Gardening with Git