Home > Uncategorized > The Balkanization of Distributed Version Control

The Balkanization of Distributed Version Control

I think it’s a shame that distributed version control systems are so fragmented. Today you’ve got darcs, mercurial (hg), git, monotone, codeville, bzr, and more to choose from. Fragmentation in this space is really bad because it’s really hard to interoperate between VCS’s. Since each tool has its own learning curve, interacting with a new project could mean having to learn yet another VCS tool.

What really bothers me about this situation is that this field (dVCS) is not very well understood yet. I think there are very few people out there who can really weigh the fundamental differences between these systems. So most people either choose to go with the first system they try, or they prefer one to the other based on strengths or weaknesses of the implementation — eg. git is too hard to understand, I can’t use git on windows, etc.

Even very visible and respected people who write about this only compare the fundamental differences in very vague terms; Ted Tso says Git has “more legs” (more potential) and Keith Packard is leery of Mercurial because Linux’s implementation of file truncation (which Mercurial uses for crash recovery) is racy.

Do you see the problem? We’re making big decisions about how we’re going to encode the history of our precious code based on shallow characteristics of the current implementations. But Mercurial and Git differ in a really fundamental way: Git stores the nodes (file revisions), Mercurial stores the edges (diffs between revisions). And I don’t think anyone fully understands all the implications of either choice.

I’d love to see some academic research that really exhaustively compares all of the dVCS approaches that are flying around, and spells out their fundamental advantages and disadvantages. But until that happens, I’m afraid we’re going to see more of the status quo, where people choose sides based on shallow reasoning and stick with their team until the end.

Categories: Uncategorized Tags:
  1. June 3rd, 2007 at 10:17 | #1

    With most of the tools you named, it’s actually fairly easy to interoperate between dVCS systems (and between dVCS and centralized systems like Perforce, CVS, and Subversion). There are reasonably mature tools both for converting repositories from one system to another, and for branching from any system to the dVCS of your choice (so that people can share work without all using the same system).

    I agree that the multiple learning curves are a problem, but I think it’s a reasonable price to pay for the explosion of new ideas in this area. I expect it will settle down eventually.

    You’re probably aware of Darcs’ theory of patches, which is a good step toward understanding what it means to store the edges.

  2. josh
    June 3rd, 2007 at 23:19 | #2

    The thing about tools like this is that they have to be *so good* before they are at all usable. I have to be able to trust that the tool can handle normal cases totally perfectly, and that it will refuse to do anything it can’t do perfectly (or let me pass a –force flag).

    Have you used any tools of this sort that you have that much confidence in? That will let you pull changes from or push changes to a repository in a different format? That will preserve all metadata about the changes (author, commit messages, timestamps, etc). If there is anything like that, it would be good news.

    Surely these tools have their limits; I want to be sure I know what those limits are.

  3. June 5th, 2007 at 10:38 | #3

    I’ve successfully used both SVK and bzr-svn, though not extensively. I certainly wouldn’t hesitate to use them. I personally haven’t used Tailor, but I’ve been hearing praise for it for quite a while. Mozilla.org successfully converted their huge CVS repository to Mercurial, though they hit a lot more bugs and limitations along the way than the average project.

    “That will preserve all metadata about the changes (author, commit messages, timestamps, etc).”

    That much seems to be easy and widely-supported. I don’t know, however, how well any of these tools preserve more complex metadata like branch/merge histories, especially between tools with different concepts of merging.

  4. Rodrigo
    July 5th, 2007 at 05:33 | #4

    It isn’t balkanization. It is diversity. And that is a very good thing. But that’s just my 2 cents.

  5. Neil Bartlett
    July 5th, 2007 at 06:00 | #5

    Are you suggesting that (for example) a Windows-only shop should choose Git simply because they prefer a node-centric design, and ignore the fact that there is currently no good implementation of Git for Windows?

    Comparing these tools based on their philosophical characteristics is all very well, and a good area for academic study… but when it comes down to actually choosing which one to use, it’s absolutely vital to look at implementation issues.

  6. Luis Bruno
    July 5th, 2007 at 08:04 | #6

    You’ve said it yourself: “What really bothers me about this situation is that this field (dVCS) is not very well understood yet.”

    That’s why there are so many, each of those a kind-of research in a certain direction. Git optimizes for some cases, darcs for others, hg somewhere else. The Collective doesn’t understand this (dVCS) area yet.

  7. July 5th, 2007 at 08:13 | #7

    Jim Gettys’s article on repository formats is an annoying pile of nonsense, by the way. It’s grown legs because Jim is respected for other work he’s done, but that doesn’t make what he has to say in that article any less silly.

    The essence of having a race condition is that you must have two participants, and Mercurial locks a repository that it’s modifying so that there can only be *one*. Very simple and difficult to overlook.

  8. Mike Laiosa
    July 5th, 2007 at 09:10 | #8

    I agree with Luis Bruno. The fact that dVCS isn’t well understood is exactly why there are many implementations. And its why thats a good thing. Since dVCS isn’t well understood, any single implementation probably gets it wrong. The way to gain understanding of dVCS is to try a bunch of things and see what works and what doesn’t. Once the space is better understood, someone will write an implementation that gets it right (or people will recognize which of the existing implementations get it right), and the others will go away.

  9. josh
    July 5th, 2007 at 09:16 | #9

    Rodrigo: I agree that diversity is good when you have interoperability. I’m not convinced that we have that with dVCS today.

    Neil: Your point is taken — for end users, they have to choose a product based on its existing strengths and weaknesses. I think I’m more concerned about people who are acting as advocates and pundits of the dVCS world. I think Jim Gettys is right that “Repository formats matter,” but not for the reasons he advances.

    Luis: what cases does Git optimize for vs. darcs vs. hg? I don’t think any of these projects is consciously picking a niche they excel at. Sure, Linus made Git for his very specific use cases, but I haven’t seen any analysis of why it is fundamentally any better at this than Mercurial.

    Bryan: agreed. To me, it’s silly for yet another reason: if ftruncate is racy, that’s a bug in Linux, not a weakness of Hg.

    Mike: I can believe that, I just hope that the people who are deciding “what works” will start making comparisons more substantive than to say that one has “more legs.” What does that even mean?

  10. July 5th, 2007 at 13:56 | #10

    Funny. You said “What really bothers me about this situation is that this field (dVCS) is not very well understood yet.” I was thinking the opposite: I’d be in favor of fewer options if the ideas were well understood. I want more options because the ideas aren’t well understood. If there were only one or two products that dominated so much that no one considered using anything else, I’d worry that we might miss some of the good ideas.

  11. August 14th, 2007 at 00:46 | #11

    I try to do “some academic research” on this and i’d like to hear your opinion on how to do the evaluation:

    http://computerroriginaliascience.blogspot.com/2007/08/how-to-evaluate-dvcs.html

  1. No trackbacks yet.