Ensuring the quality of large software development projects can be tedious: many variables and many...
Version Control Systems: Keep Your Code in Order!
Software versioning and revision control systems manage vast quantities of code and track all lines that are added, modified, or deleted from a project over time. Subversion (SVN), Git, Mercurial, and Bazaar are all actively developed version control software solutions that work for development projects large and small.
What is “version control”?
Version control refers to the cataloging of revisions to some (generally text-based) document. While the term “version control” is most generally associated with programmers, it is equally relevant for writers, journalists, and even university students. Examples of common services that automatically track document revisions and versions include Google Docs and Dropbox.
Try it out! In Google Docs just click on “File” -> “See revision history.” In Dropbox simply right-click on a file that has been revised and click on “View Previous Versions.”
For a term paper, it is likely sufficient to be able to recover a previous iteration of a document in-full and copy out some text that was deleted.
For large programming projects, however, a software revision control system must be able to track all changes – deletions, additions, and moves – made by many users, and be able to efficiently show what was modified from version to version. They must also support common features such as branching and merging that developers rely on. As the image to the left illustrates, software development projects typically “branch” off into new side projects, in some cases “merging” back into the main development line, and in other cases being discontinued. The creation of new development branches and the ability to merge back into a main trunk is much more sophisticated than tracking what one student adds or deletes from their term paper!
What is the purpose of Version Control Systems?
Eric Sink, a software developer and writer, has suggested that there are ultimately three goals that all version control systems try to achieve. To paraphrase his thoughts, these are:
- to allow developers to work simultaneously on the same project (in other words, to access and work with the same code at the same time);
- to avoid conflicts arising from different developers working simultaneously on the same code;
- and “to archive every version of everything that has ever existed — ever.”
As we can imagine, these goals are difficult to achieve. For starters, tensions inherently arise between the first and second aim. If only one person is working on a project at a time, then it is easy enough to record snapshots of what changes are made and commit those snapshots and changes back to your repository. But as soon as two or more programmers are touching the same code at the same time – which is by definition one of the principal purposes of version control software – then it becomes difficult to make sure that Ted’s changes to the code don’t conflict with Kathy’s changes.
As this timeline shows, 2005 was a hot year for the launch of new third-generation version control systems! (More on the reason for that below.)
What differs between different version control systems?
Eric Raymond suggests that version control systems can be categorized according to the following three criteria:
- Repository location, which can be either
- or decentralized (distributed).
- The methods of checking out, merging, and committing code can be either
- or commit-before-merge.
- VCSs can perform either:
- file operations
- or fileset operations.
Let’s take a look at what the first two of these categories actually mean on a practical level (we won’t be focusing on file or fileset operations here).
- Location of the code repository
When using any version control system, code for your project will be stored in what is termed a “repository.” Think of it like the warehouse for your project.
When a programmer accesses a centralized repository for local use, they don’t get a full-fledged copy of the entire repository; the only complete and authoritative copy of the repository resides at a central location – either on your server in your physical office building or perhaps in the cloud. Naturally this server will be backed up, and in this way your data should be safe. All the same, you are stuck putting all of your eggs in one basket (at least for short-term periods). If your central server fails, you may be unable to make commits or pull code from your repository until it is back up and running.
When repositories are decentralized, then they are not tied to a single server or physical location. With decentralized repositories – such as Git, Mercurial, and Bazaar offer – any time a programmer pulls a repository they get a full copy of all code and all revision history on their local machine. In effect, this means that you will typically have num_repo_copies > n, where n is the number of programmers on your team (it is quite likely that at least some individuals will retain repositories on more than one machine). This data redundancy not only provides greater overall data security, but ensures that an entire project will not be disrupted if a single node loses functionality.
- Methods of checking out, merging, and committing code
When working with version control systems it is important to know the significance or locking vs. merge-before-commit vs. commit-before-merge when managing your repository data.
Locking a file upon checkout was the first method employed by early VCSs to make sure that conflicts don’t emerge from multiple simultaneous code edits. By locking a file, a user claims the sole right to access or modify a file until they check it back in and unlock it. This ensures that two or more programmers cannot generate conflicting code, but in most cases it ties up workflows and is a big hassle. Though there are still some instances – for example, when working with binary code – when locking may be relevant, it is relied on less and less over time, and is not the dominant method of any modern VCS.
Merge-before-commit was employed by second generation VCSs such as SVN, and requires that a programmer’s additions and deletions to code be merged with the most current version on the server prior to accepting a commit. This ensures that no code can be committed that generates conflicts. In the newest generation of VCSs, this need to merge first has been eliminated, meaning that any and all commits will be accepted by a repository. When modified code is committed it is simply saved as a new sub-branch that can later be merged back into the primary development branch or continued as an offshoot (This site offers great visuals explaining this model).
A Quick Comparison of SVN, Git, Mercurial, and Bazaar
Based on Eric Raymond’s categorizations.
|Version Control System||Repository Location||Checkout, Merge, Commit|
|Git||Decentralized (or Distributed)||commit-before-merge|
|Mercurial||Decentralized (or Distributed)||commit-before-merge|
|Bazaar||Decentralized (or Distributed)||merge-before-commit|
Now we will take a closer look at each pick in our list of version control systems in an attempt to understand which may be the best choice for your development team and projects.
SVN (short for Apache Subversion) is a version control system that was originally designed to replace CVS (Concurrent Versions System) while adding additional functionality that CVS lacked. Originally released in 1990, the CVS version control system had high ratings back in its day, and saw its final stable version release in 2008.
Unlike its third-generation peers that we are reviewing below, the SVN version control system is considered a second-generation VCS. Most significantly, it relies on a central repository. While SVN is getting up there in years by software standards, however, it is still in active development and active use worldwide. The cloud backup company Backblaze, for example, has been relying on SVN since they started operations in 2007. An article on their blog from 2014 explains why they have no interest in moving to a different version control platform:
“Subversion has been a good fit for Backblaze as we are fairly linear in our development practices. We rarely (if ever) branch our code, instead adding things in a continuous fashion.”
While many companies may want a more ‘modern’ SVN that supports infinite branching and a distributed repository model, SVN is still a solid contender and works wonderfully with certain workflows.
Developers continue to trust SVN for their version control needs because:
- many teams have years of experience with SVN (it’s familiar);
- SVN is great for linear development methods (minimal branching);
- SVN is well-documented, and there are many online tutorials and print resources available;
- and because it’s free and relatively easy to use.
Developed under Linus Torvald’s direction (Linus is the namesake of “Linux”), the Git version control system certainly had some celebrity from the get-go. The development of Git was spurred by BitKeeper’s decision to stop providing free licenses to the Linux development community in 2005 (Bitkeeper had been the VCS of choice for Linux kernel development). After the break with BitKeeper, the Git project was launched on 7 April 20015, only days prior to the launch of Mercurial, which was started for the same purpose – that is, to be an open source replacement for Bitkeeper to be used for Linux development.
Git has gained a lot of traction over the years, and benefits from the well-designed GitHub online interface that allows for “Powerful collaboration, code review, and code management for
open source and private projects.” With Github, “public projects are always free,” and sharing Git repositories has become the go-to method for many open source software developers to share and collaborate on projects.
Written in C/C++, Git is fast and stable. Like the Unix shell, Git is a combination of many discrete components that together provide impressive functionality. On the upside, Git can do nearly anything you want it to. But on the downside, Git can be complicated to learn, is not well-documented, and changes rapidly so that keeping pace with all its features can be a challenge.
Git may be the best version control system:
- for developers who enjoy Unix shell scripting;
- for teams that don’t mind a bit of a learning curve to achieve amazing functionality;
- and for free, open-source projects that want to easily share source code on Github.
A direct Git competitor – and launched a mere 12 days after the former – the Mercurial version control system is somewhat more streamlined and well-documented and is written in Python instead of C/C++ (theoretically making it a bit slower, but likely you won’t notice). Though Git was ultimately chosen over Mercurial for ongoing Linux kernel development, Mercurial has advanced and come into its own over the past decade.
According to Stackoverflow’s 2015 worldwide developer survey, approximately 8% of developers today rely on Mercurial. Facebook is likely the largest company to use Mercurial for their development. Writing in 2014, Durham Goode and Siddharth Agarwal describe how Facebook improved Mercurial to meet their specific needs, contributing “over 500 patches to Mercurial over the last year and a half.”
If Facebook can continue its impressive growth using Mercurial for version control, then it’s pretty clear that Mercurial means business. Definitely don’t write Mercurial (or Bazaar, our next VCS) off because they don’t currently top the popularity charts.
Mercurial offers the following distinctive features:
- tracking of family history of files, and tight controls to preserve the integrity of file history;
- GUI support with tools such as TortoiseHg, MercurialEclipse, and SourceTree;
- a simpler command-line interface than Git;
- and great extensibility through Shell scripts or using Python APIs.
Canonical, the UK-based company behind Ubuntu (probably the most popular Linux distro today), kicked off the development of the Bazaar version control system (initially “Baz”) in March 2005 (shortly before the launches of Git and Mercurial). Bazaar is offered as free software as part of the GNU Project, though its development is ongoing and still sponsored by Canonical.
Mixing the ideology of second and third-generation version control systems, Bazaar actually allows you to set up either a centralized repository or distributed repositories. This can be especially helpful if you are migrating from a pre-existing central repository with SVN or some other tool.
Bazaar attempts to distinguish themselves as being the “Version Control for Human Beings,” meaning that they value simplicity, clean help files, and the availability of GUIs for Windows, Mac, and Linux distros. One special feature of Bazaar is its support for “transparent foreign branches,” which in practice means the ability to access non-Bazaar repositories from the likes of SVN, Git, and Mercurial, working in a “mixed VCS environment” while maintaining the Bazaar commands and interface.
Bazaar is a compelling version control system choice, offering:
- simple GUI interfaces for Windows, Mac, and Linux;
- easy-to-read, descriptive help files for all commands;
- the support of Canonical, a proven open source partner for over a decade;
- and easy-to-install plugins, reminiscent of the plugins feature in Firefox.
It’s important to consider your team’s experience and requirements when selecting a version control system or migrating to a new VCS. For those with extensive previous experience with CVS, it is arguably easier to acclimate to either SVN or Bazaar based on their centralized repository (or optional centralized repository) implementations. Git will generally have a steeper learning curve than SVN, Mercurial, and Bazaar, but packs some incredible power under the hood that may make it worth the effort.
All four of the version control systems we have reviewed are available for free under some version of an open source license, making cost – happily – one of the lesser concerns when choosing between these four solid software packages.
Which version control system does your team currently use, and why did you choose that VCS? Have you had experiences migrating from an older VCS to either of SVN, Git, Mercurial, or Bazaar? We are always glad to hear your thoughts and comments. Join us on Facebook, LinkedIn, and Twitter, or drop us an email if you would like to hear more about version control or discuss the web development expertise that WebiNerds has to offer!