Talking About DVCS  July 28th, 2008

I've been using git a lot lately (flot, for example) and I've really come to appreciate it. Once you get past the chicken-in-the-headlights stage (in my case, just barely) it starts to become clear what it is you're gaining by using it (or one of the many other options).

What do you gain, exactly? Well, that's the question posed in a recent, thoughtful article I came across on reddit. I'll put in my 2 cents.

What's Different?

From the article:

...I look at why people get excited about DVCS and basically all I see are decent merging algorithms. If I had to guess, I’d say that 90% or more of the claimed benefits of DVCS really do boil down to “wow, it’s easy to merge stuff”.

So I don't think I agree with the 90% statistic (you know what they say about statistics), but that's not the real point being made. The real statement here is that 'merging is the most important part of a distributed version control system.' I think I'll agree with that.

Sure enough, DVCS' make it super easy to merge. But that shouldn't be the focus. It's not enough for it to be easy to merge. The key is that it's all about merging. This doesn't sound important to someone used to Subversion ("I don't really need to merge that often anyway.") but it changes how you work on your code-base in a fundamental way. In a DVCS you always branch for everything you ever do. Fixing a bug? git branch some-bug. And then off you go, fixing the bug...

You're back? Fixed it?

You might have dozens of these branches going on at once. A typical minimum is a dev branch that you periodically bring up to date with master as you go. But usually it's even easier to just have a separate branch for each feature or fix you're working on. You can just commit whatever you have done if you need to do something on another branch or you can use git stash to shelve some changes to your working directory while you go off somewhere else.

In short, git encourages you to branch and merge haphazardly. If in doubt, branch.

No, really, just always branch.

Another snippet:

  • Before the switch [to a DVCS], people sent patches in for review by the core team, which then decided whether to commit those patches to the master tree.
  • After the switch [to a DVCS], people send in patches for review by the core team, which then decides whether to commit those patches to the master tree.

The argument here is that the development process with distributed version control isn't much different (infact, not different) than a traditional centralized process. But wait! The "send patch for review/consideration and it might make it into core" process is just one possible process. And here's where you'll find the real benefit to the distributed tools.

You don't just submit patches to the core team. You clone the whole project history and work on whatever you want, branching/committing/merging/etc as you go. While this is going on you're pulling new commits from the 'central' copy and keeping up to date. When you're done you don't send a patch to the maintainer! No! You just ask them to pull from you. They checkout a new branch (git branch some-dudes-great-idea), pull your changes, look them over, maybe leave them for a while, whatever. If they eventually decide they want to integrate your stuff into the 'central' copy (or any other branch) they can just git merge some-dudes-great-idea and we're done. But even that is a simplistic use case compared to what you can find in...

The Real World™

The biggest examples of what can happen with distributed version control can be found on GitHub. A great first step is to look at a high profile project like Ruby on Rails. The 'network graph' page for Rails which shows a little timeline graph showing branches and forks of the project. Wait, what's that? It says that the Rails graph is too massive and insane to show? Exactly!

This is how distributed version control works. You don't submit a patch. You just fork/clone (which is, in essence, similar to branching) the whole project and go about your hacking. When you're done you can collaborate directly with the maintainers or other developers that have forks/clones on whether those changes belong in core, in a sub-project, should just stay forked, or whatever.

This is a beneficial system even for smaller projects. I've been maintaining a flot fork on GitHub since early June 2008. The network graph is well within the limits of the system to display, so check it out:

Flot's network graph as of July 29th, 2008

There are 2 forks (of my fork) but only one (at the time of this writing) has a branch/has made any commits. Great. But if I'm just going to accept a pull request and their fork dies then what's so 'distributed' about this, right?

Centralized == Distributed # => FlameWarWarning

What’s worse, there’s a nasty tendency to muddle the meanings of important words. We’re told that Rails, for example, switched away from a “centralized” system, but it’s hard to see how the new setup is any less “centralized” than the old: there’s still a single master tree that forms the basis of public distribution, and there’s still a core team of privileged committers who act as gatekeepers to that tree.

Right, I totally agree on this one. The thing to do is ignore the hype, of course. Realize that "centralized" isn't some sort of naughty word. And "distributed" doesn't necessarily mean there's no 'master' copy. There almost always is. The real point is touched on with the idea of 'offline commits'. So let's have another h3 to change the focus here...

Offline Private Commits

Offline commits aren't the argument. They're just one facet. What distribution really means here is that I can clone (fork/branch) a project, make some changes (in branches) while my friends/team/etc do the same thing. We carry on committing and so on. But we constantly merge/pull from each other. We can hack along like this for as long as we want (let's say we're adding CouchDB support to ActiveRecord). A few weeks go by and we're done. So we get in touch with the Rails core developers on irc/twitter/whatever and talk with them about it and ask them to pull from us. This is easy due to the mind-numbingly simple merging, but also because we were able to constantly stay up to date with the 'real' Rails project as we went.

What I mean is that 'offline' doesn't really refer to 'no broadband.' What it really means is that you don't have to have access to the centralized copy in order to do any significant work. Sure, at my job I can make changes to files without having a network connection. But a couple of days go by and you've got a horrible mess on your hands... Not to mention your commit when you get back 'online' is near useless because it includes tons of unrelated changes.

I'm talking about 'offline' in the email-thread "contact me offline about this" style. Eg, privately. I propose private commits replaces 'offline commits' to describe this. The fact that you can be offline and commit isn't the whole story by a long shot. Private commits are really what you're doing, for the most part. It doesn't matter if you're on a plane with no network at all or only sharing commits between a couple friends, you aren't required to share them with the master copy -- you're 'off the line'. It's all about cheap branches, the tool stays out of your way and leaves you to collaborate and share code and patches however makes sense for you. Last I checked branches in Perforce are actually stored on the server! That means you can't even start a branch without everyone else in the whole company (or whatever) seeing it! That's at least limiting the possibilities if not actively discouraging experimentation and sharing of code.

Back to our ActiveRecord fork. It gets even better, let's say they don't want our crappy CouchDB-related changes. We're terrible coders, or something, so they tell us to get lost. No problem, we can maintain our fork for people who want it. Or just for ourselves. We can keep pulling from the master project to keep our version up to date for as long as we like.

Distribution FTW

That's distribution. Not some sort of weird abstract idea where the project doesn't have any 'official' central copy. How else will people get it? There has to be release mechanisms, for one! No, the 'distributed' in DVCS is really just that is let's you do things in a distributed way, coming back together when appropriate.

Just think less about the 'official' repositories (which, as I said, almost certainly exist). Your clone of flot that you pulled from GitHub is your own repository. I'm serious. Just go nuts. Make a load of branches, merge them with each other, trash some -- find another fork of flot and pull their changes. Collaborate with that other 'forker' as much as you like completely independent of my fork (and the original subversion version). You might want to take flot in a specific direction for your own project that most likely no one else will need/be interested in... Awesome! Keep your fork going as long as you need it.

Distributed version control is like a lifestyle, once you get used to it. I often feel held back at work using Perforce. It's much easier to just try something random when you don't have to worry about cleaning up your mess if it doesn't work out or, even worse, everyone in the company looking at your changes with a smirk on their face while you're in the middle of some experiment.

I'd argue passionately that software development is (at least) half art. DVCS' let you work with other artists (or not) however you want, in whatever sort of hierarchy (such as 'none', or 'BDFL') you like, with no dependency on one specific server (or person) and the privacy to screw up as much as you need to before it's ready for the world. As Matt Todd said during his RubyFringe talk: "Don't let your good judgment get in the way." I'd say your version control system should be expected to stay even further out of your way than your judgment.

Lastly...

There's something fundamentally different about these two types of tools and, often, comparing them isn't really a useful thing. One just needs to try and see the unique benefits of each and use the appropriate tool for your situation.

A disclaimer: I use git primarily because it's the leading tool in the Ruby community (also, GitHub rocks). In my opinion nearly all the options are viable, "six of one, half-dozen of the other." I think git has come a long way recently in 'mainstream appeal' as in your average developer can grok it without meditating in a cave for several months. However, don't take what I say here as some sort of assertion that git is somehow 'the winner.'

Another disclaimer: I'm also not even close to a git-guru. So if I wrote something above in three lines that could be written in only one (or, like, half :)), I apologize, email me!

Some social stuff: