Garrett Rooney, Practical Subversion

Subversion is having what can only be described as a subversive effect on the versioning software environment. CVS has long been the standard amongst programmers, but it has it’s faults and Subversion (read Sub-version) addresses those known and perceived about CVS. I talked to Garrett Rooney about his book Practical Subversion, his contributions to the Subversion code and where Subversion fits into the scheme of your administration and development environments.

Practical SubversionI see from the book you are a strong believer in version control - can you summarize the main benefits of version control?

I like to think of version control as a way of communicating information between developers.

When you commit a change to a source tree you can think of it as an automated way of telling every other developer how they can fix the same problem in their source tree. The benefits go further though, since in addition to keeping everyone on the team up to date with the latest fixes, you’re also recording all of the history. This means that later on, when you want to figure out how a piece of code got the way it is you can look at the series of changes (and hopefully the justification for the changes, if you’ve been good about writing log messages) that let to the current situation. Looking at that history is often the best way to understand why the code got the way it is, which means you’re less likely to make the same mistake twice when making new changes.

So version control is really a way to help you communicate, both with other people working on your project right now and with those working on it in the future.

There’s been a lot of discussion online about the benefits of Subversion compared to the previous preferred environment of CVS. How much better is Subversion?

I recently had to start using CVS again, after a rather long period of time where I’d only used either Subversion or Perforce, a commercial version control system. It never ceases to amaze me, whenever I go back to CVS, how irritating it is to use.

Let’s start with the basics. Lots of things in CVS are slow.

Specifically, lots of operations that I like to do fairly often (’cvs diff’ is the big one here) need to contact the repository in order to work, this means going out over a network, which means it’s pretty slow. In Subversion the equivalent command is lightning quick, since your working copy keeps a cached copy of each file, so it doesn’t have to contact the server in order to show you the difference between the file you started with and the new version you created.

There are other parts of CVS that are also quite slow when compared to Subversion. In CVS the act of tagging or branching your source tree requires you to make a small change to each and every file in the tree. This takes a lot of time for a large tree, and a noticable amount of disk space. In Subversion the equivalent operation takes a constant, and very small amount of time and disk space.

The other big improvement is the fact that in Subversion changes are committed to the source tree in an atomic fashion. Either the entire change makes it in or none of it does. In CVS you can get into a situation where you updated your working copy in the middle of a commit, resulting in you getting only half of the changes, and thus a broken source tree. In Subversion this doesn’t happen.

The same mechanism means that it’s much easier to talk about changes in Subversion than in CVS. In CVS if you have a change to five separate files in order to talk about it you need to talk about the individual change to each file. “I committed revision 1.4 of foo.c, 1.19 of bar.c, …” This means that if someone wants to look at the change you made to each file they have to go look at each individual file to do it. In Subversion you just say “I committed revision 105″, and anyone who wants to look at the diff can just say something like “svn diff -r104:105″ to see the difference between revision 104 and revision 105 of the entire tree. This is also quite useful when merging changes between branches, something that’s quite difficult in CVS.

Finally, the user interface provided by the Subversion client is simply nicer than the one provided by CVS. It’s more consistent, and generally easier to use. Enough things are similar to CVS that a CVS user can easily get up to speed, but the commands generally make sense to a new user, as compared to those of CVS which can be rather confusing.

How does Subversion compare with version controls other than CVS, BitKeeper for example has been in the news a lot recently. How about commercial products, like Visual SourceSafe or ClearCase?

I’ve personally never used BitKeeper, largely because of its license. While BK was available under a “free as in beer” license for use in developing open source software the license prohibited users from working on competing products, like Subversion. As a result I’ve never really had a chance to try it out.

I do think that BitKeeper has some interesting ideas though, and the other distributed version control systems (Arch, Darcs, Bazaar-NG, etc) are all on my radar. I don’t know if I’m convinced of their advantages over centralized systems like Subversion, but there is interesting work being done here. Personally, of the three distributed systems I just mentioned I’m most interested in Bazaar-NG (http://bazaar-ng.org/).

As for the commercial products out there, I’ve had personal experience with Perforce and SourceSafe. I wasn’t impressed with SourceSafe at all, and I really can’t think of a situation where I’d use it willingly. Perforce on the other hand is a very nice system. Its branching and merging support is superior to what Subversion provides at the moment (although the Subversion team has plans to close that gap in the future). That said, Perforce is expensive, and unless you really need specific features that can only be found there I wouldn’t see much reason to go with it.

You sound like you’ve had a lot of personal experience of where the right source control mechanism has saved your life. Any true tales that might help highlight the benefits of version control?

Personally, my most memorable experiences where version control would have been a lifesaver are those from before I started making use of it on a daily basis.

I know that back in college there were several times where I was working late at night on some project, usually due within a few hours, and I managed to screw things up badly. It’s remarkable how easy it is to go from a version of a program that’s mostly working to one that’s totally screwed up, all while trying to fix that last bug. It’s especially bad when your attempt to fix that last bug introduces more problems, and you can know longer remember exactly what you changed.

With a version control system, you never really need to be in that situation. At the absolute worst, you can always roll back to the version of the code you had at your last commit. It’s impossible to get stuck in that situation where you can’t figure out what you changed because the system will remember for you.

Now that all of my non-trivial projects (and most of my trivial ones honestly) make use of version control I just don’t find myself in those kind of situations anymore.

Existing developers will almost certainly need to migrate to Subversion - how easy is this?

It’s easy to underestimate the problems that come with migrating from one version control system to another. Technically, a conversion is usually pretty straitforward. There are various different programs available to migrate your data (cvs2svn for CVS repositories, p42svn for Perforce, and others), and in many cases it can be tempting to just run the conversion, toss your users some documentation and away you go.

Unfortunately, it isn’t that simple. Version control becomes part of a developer’s day to day workflow, and changing something like that has consequences. There needs to be careful planning and most importantly you need to have buy in from the people involved.

Another Subversion developer, Brian Fitzpatrick, will actually be giving a talk about this very subject at OSCON this year, and I’m looking forward to hearing what he has to say.

http://conferences.oreillynet.com/cs/os2005/view/e_sess/6750

Some versioning systems have problems with anything other than text. What file types does Subversion support?

Subversion, by default, treats all files as binary. Over the network, and within the repository, files are always treated as binary blobs of data. Binary diff algorithms are used to efficiently store changes to files, and in a very real sense text and binary files are treated identically.

Optionally, there are various ways you can tell Subversion to treat particular files as something other than binary.

If you want end-of-line conversion to be performed on a file, for example so it could show up as a DOS style file when checked out on windows but a Unix style file when checked out on a Unix machine all you have to do is set the svn:eol-style property on the file

Similarly, if you want keyword substitution to be performed on a file, so words like $Revision$ or $Date$ to be replaced with the revision and date the file was last changed on, you can set the svn:keywords property to indicate that.

The key fact to keep in mind is that in Subversion these are optional features that are turned off by default. By their very nature they require that Subversion make changes to your files, which can be catestrophic in some cases (changing all the \r\n’s in a binary file to \n’s isn’t likely to work very well), so you need to ask for this behavior if you want it. In systems like CVS these kind of features are turned on by default, which has resulted in countless hours of pain for CVS users over the years.

From reading your book, it’s obvious that Subversion seems a little bit more application friendly than CVS, integrating with Apache, emacs and others with a little more grace than CVS and RCS. Is that really the case?

Well, let’s be fair, CVS and RCS have quite good integration with various tools, ranging from Emacs to Eclipse. That said, it hasn’t been easy to get to that point. In many cases tools that want to integrate with CVS have to jump through hoops to call out to the command line client and parse the resulting output, which can be fragile. In cases where that isn’t possible many projects have reimplemented CVS so that it could be more easily integrated.

In Subversion many of these problems are aleviated by the fact that the core functionality is implemented as a collection of software libraries. If you want to make use of Subversion in your own code all you need to do is link against the Subversion libraries and you can provide exactly the same functionality as the official Subversion client. If you’re not working in C or C++ there are probably bindings for the Subversion libraries written in your language of choice, so you can even do this without having to learn a lot about the C level libraries.

Additionally, Subversion’s ability to integrate with Apache has provided a number of abilities, ranging from WebDAV integration to the ability to use an SQL or LDAP database for storing usernames and passwords, that otherwise would have been incredibly difficult to implement. By working within the Apache framework we get all of that for free.

Subversion includes autoversioning support for DAV volumes, could you explain how you could use that to your advantage?

DAV autoversioning is most useful when you need to allow non-technical users, who would be uncomfortable making use of normal Subversion clients, to work with your versioned resources. This could mean source code, but most commonly it involves graphics files or word documents and other things like that. Your users simply use a DAV client (which is often built into their operating system) to access the files, and the version control happens transparently, without them even knowing about it. When they save their changes to the file it is automatically committed to the repository. This is a very powerful tool, and can give you some of the advantages of version control without costly training for your users.

Most people use version control for their development projects, but I’ve also found it useful for recording configuration file changes. Is that something you would advocate?

Absolutely! I personally keep much of my home directory under Subversion’s control, allowing me to version my editor’s config files, .bashrc, et cetera. All the same benefits you can get from using version control with software development are just as applicable for configuration files.

How important do you think it is for high quality tools like Subversion to be Open Source?

I’m a big fan of using open source licensing and development models for infrastructure level software (operating systems, compilers, development tools, etc). Well, honestly I’m a big fan of using open source licenses and development models for most kinds of software, but I think it’s particularly appropriate for software at the infrastructure level, where the primary use case is building larger systems.

It’s difficult to imagine a company being able to make money developing a new operating system in this day and age, or a version control system, or a C runtime library. These are largely commoditized parts of the software ecosystem, and as a result I think it makes sense for the various people who benefit from having them available to share the cost for producing them, and the best way we currently have to do that is via open source.

Additionally, it’s difficult to underestimate the value of having high quality systems out there for people to learn from. I’ve learned a great deal by reading open source code, and even more by participating in open source projects.

Finally though, I just like the fact that if I find a problem in a piece of open source software like Subversion I can actually do something about it. I’ve worked with closed source third party software in the past, and I’ve found that I tend to spend a lot of time digging through inadequate documentation and beating my head against the wall while trying to work around bugs. With an open source product you can at least make an attempt to figure out what the actual problem is.

Contributing to Subversion in your spare time doesn’t seem like a very relaxing way to spend your free time. Is there something, less, computer based that you like to do?

I don’t know, there’s something fun about working on open source projects. It’s awfully nice to have the freedom to do things the “right” way, as opposed to the “get it done right now” way, which happens far too often in the commerical software world.

That said, I do try to get off of the computer from time to time. I see a lot of movies, read a lot, and lately I’ve been picking up photography. I also just moved to Silicon Valley, so I’m making a concerted effort to explore the area.

Anything else in the pipeline you’d like to tell us about?

Well, Subversion 1.2 is on its way out the door any day now, and that’ll bring with it some great new features, primarily support for locking of files, something that many users had been requesting.

As for my other projects, I’m going to be giving a talk at O’Reilly’s OSCON again in August. This year I’ll be speaking about the issues regarding backwards compatibility in open source software. I’ve also been spending a lot of time on the Lucene4c project (http://incubator.apache.org/lucene4c/), trying to provide a C level API to access the Apache Lucene search engine.

Garrett Rooney Bio

Garrett Rooney works for Ask Jeeves, in Los Gatos CA, on Bloglines.com. Rooney attended Rensselaer Polytechnic Institute, where he managed to complete 3 years of a mechanical engineering degree before coming to his senses and realizing he wanted to get a job where someone would pay him to play with computers. Since then, Rooney completed a computer science degree at RPI and has spent far too much time working on a wide variety of open source projects, most notably Subversion.