PEP: 374 Title: Choosing a distributed VCS for the Python project
Version: $Revision$ Last-Modified: $Date$ Author: Brett Cannon
<brett@python.org>, Stephen J. Turnbull <stephen@xemacs.org>, Alexandre
Vassalotti <alexandre@peadrop.com>, Barry Warsaw <barry@python.org>,
Dirkjan Ochtman <dirkjan@ochtman.nl> Status: Final Type: Process
Content-Type: text/x-rst Created: 07-Nov-2008 Post-History: 07-Nov-2008,
22-Jan-2009

Rationale

Python has been using a centralized version control system (VCS; first
CVS, now Subversion) for years to great effect. Having a master copy of
the official version of Python provides people with a single place to
always get the official Python source code. It has also allowed for the
storage of the history of the language, mostly for help with
development, but also for posterity. And of course the V in VCS is very
helpful when developing.

But a centralized version control system has its drawbacks. First and
foremost, in order to have the benefits of version control with Python
in a seamless fashion, one must be a "core developer" (i.e. someone with
commit privileges on the master copy of Python). People who are not core
developers but who wish to work with Python's revision tree, e.g. anyone
writing a patch for Python or creating a custom version, do not have
direct tool support for revisions. This can be quite a limitation, since
these non-core developers cannot easily do basic tasks such as reverting
changes to a previously saved state, creating branches, publishing one's
changes with full revision history, etc. For non-core developers, the
last safe tree state is one the Python developers happen to set, and
this prevents safe development. This second-class citizenship is a
hindrance to people who wish to contribute to Python with a patch of any
complexity and want a way to incrementally save their progress to make
their development lives easier.

There is also the issue of having to be online to be able to commit
one's work. Because centralized VCSs keep a central copy that stores all
revisions, one must have Internet access in order for their revisions to
be stored; no Net, no commit. This can be annoying if you happen to be
traveling and lack any Internet. There is also the situation of someone
wishing to contribute to Python but having a bad Internet connection
where committing is time-consuming and expensive and it might work out
better to do it in a single step.

Another drawback to a centralized VCS is that a common use case is for a
developer to revise patches in response to review comments. This is more
difficult with a centralized model because there's no place to contain
intermediate work. It's either all checked in or none of it is checked
in. In the centralized VCS, it's also very difficult to track changes to
the trunk as they are committed, while you're working on your feature or
bug fix branch. This increases the risk that such branches will grow
stale, out-dated, or that merging them into the trunk will generate too
may conflicts to be easily resolved.

Lastly, there is the issue of maintenance of Python. At any one time
there is at least one major version of Python under development (at the
time of this writing there are two). For each major version of Python
under development there is at least the maintenance version of the last
minor version and the in-development minor version (e.g. with 2.6 just
released, that means that both 2.6 and 2.7 are being worked on). Once a
release is done, a branch is created between the code bases where
changes in one version do not (but could) belong in the other version.
As of right now there is no natural support for this branch in time in
central VCSs; you must use tools that simulate the branching. Tracking
merges is similarly painful for developers, as revisions often need to
be merged between four active branches (e.g. 2.6 maintenance, 3.0
maintenance, 2.7 development, 3.1 development). In this case, VCSs such
as Subversion only handle this through arcane third party tools.

Distributed VCSs (DVCSs) solve all of these problems. While one can keep
a master copy of a revision tree, anyone is free to copy that tree for
their own use. This gives everyone the power to commit changes to their
copy, online or offline. It also more naturally ties into the idea of
branching in the history of a revision tree for maintenance and the
development of new features bound for Python. DVCSs also provide a great
many additional features that centralized VCSs don't or can't provide.

This PEP explores the possibility of changing Python's use of Subversion
to any of the currently popular DVCSs, in order to gain the benefits
outlined above. This PEP does not guarantee that a switch to a DVCS will
occur at the conclusion of this PEP. It is quite possible that no clear
winner will be found and that svn will continue to be used. If this
happens, this PEP will be revisited and revised in the future as the
state of DVCSs evolves.

Terminology

Agreeing on a common terminology is surprisingly difficult, primarily
because each VCS uses these terms when describing subtly different
tasks, objects, and concepts. Where possible, we try to provide a
generic definition of the concepts, but you should consult the
individual system's glossaries for details. Here are some basic
references for terminology, from some of the standard web-based
references on each VCS. You can also refer to glossaries for each DVCS:

-   Subversion : http://svnbook.red-bean.com/en/1.5/svn.basic.html
-   Bazaar : http://bazaar-vcs.org/BzrGlossary
-   Mercurial :
    http://www.selenic.com/mercurial/wiki/index.cgi/UnderstandingMercurial
-   git : http://book.git-scm.com/1_the_git_object_model.html

branch

    A line of development; a collection of revisions, ordered by time.

checkout/working copy/working tree

    A tree of code the developer can edit, linked to a branch.

index

    A "staging area" where a revision is built (unique to git).

repository

    A collection of revisions, organized into branches.

clone

    A complete copy of a branch or repository.

commit

    To record a revision in a repository.

merge

    Applying all the changes and history from one branch/repository to
    another.

pull

    To update a checkout/clone from the original branch/repository,
    which can be remote or local

push/publish

    To copy a revision, and all revisions it depends on, from a one
    repository to another.

cherry-pick

    To merge one or more specific revisions from one branch to another,
    possibly in a different repository, possibly without its dependent
    revisions.

rebase

    To "detach" a branch, and move it to a new branch point; move
    commits to the beginning of a branch instead of where they happened
    in time.

Typical Workflow

At the moment, the typical workflow for a Python core developer is:

-   Edit code in a checkout until it is stable enough to commit/push.
-   Commit to the master repository.

It is a rather simple workflow, but it has drawbacks. For one, because
any work that involves the repository takes time thanks to the network,
commits/pushes tend to not necessarily be as atomic as possible. There
is also the drawback of there not being a necessarily cheap way to
create new checkouts beyond a recursive copy of the checkout directory.

A DVCS would lead to a workflow more like this:

-   Branch off of a local clone of the master repository.
-   Edit code, committing in atomic pieces.
-   Merge the branch into the mainline, and
-   Push all commits to the master repository.

While there are more possible steps, the workflow is much more
independent of the master repository than is currently possible. By
being able to commit locally at the speed of your disk, a core developer
is able to do atomic commits much more frequently, minimizing having
commits that do multiple things to the code. Also by using a branch, the
changes are isolated (if desired) from other changes being made by other
developers. Because branches are cheap, it is easy to create and
maintain many smaller branches that address one specific issue, e.g. one
bug or one new feature. More sophisticated features of DVCSs allow the
developer to more easily track long running development branches as the
official mainline progresses.

Contenders

  Name        Short Name   Version   2.x Trunk Mirror                      3.x Trunk Mirror
  ----------- ------------ --------- ------------------------------------- --------------------------------------------
  Bazaar      bzr          1.12      http://code.python.org/python/trunk   http://code.python.org/python/3.0
  Mercurial   hg           1.2.0     http://code.python.org/hg/trunk/      http://code.python.org/hg/branches/py3k/
  git         N/A          1.6.1     git://code.python.org/python/trunk    git://code.python.org/python/branches/py3k

This PEP does not consider darcs, arch, or monotone. The main problem
with these DVCSs is that they are simply not popular enough to bother
supporting when they do not provide some very compelling features that
the other DVCSs provide. Arch and darcs also have significant
performance problems which seem unlikely to be addressed in the near
future.

Interoperability

For those who have already decided which DVCSs they want to use, and are
willing to maintain local mirrors themselves, all three DVCSs support
interchange via the git "fast-import" changeset format. git does so
natively, of course, and native support for Bazaar is under active
development, and getting good early reviews as of mid-February 2009.
Mercurial has idiosyncratic support for importing via its hg convert
command, and third-party fast-import support is available for exporting.
Also, the Tailor tool supports automatic maintenance of mirrors based on
an official repository in any of the candidate formats with a local
mirror in any format.

Usage Scenarios

Probably the best way to help decide on whether/which DVCS should
replace Subversion is to see what it takes to perform some real-world
usage scenarios that developers (core and non-core) have to work with.
Each usage scenario outlines what it is, a bullet list of what the basic
steps are (which can vary slightly per VCS), and how to perform the
usage scenario in the various VCSs (including Subversion).

Each VCS had a single author in charge of writing implementations for
each scenario (unless otherwise noted).

  Name        VCS
  ----------- -----
  Brett       svn
  Barry       bzr
  Alexandre   hg
  Stephen     git

Initial Setup

Some DVCSs have some perks if you do some initial setup upfront. This
section covers what can be done before any of the usage scenarios are
run in order to take better advantage of the tools.

All of the DVCSs support configuring your project identification. Unlike
the centralized systems, they use your email address to identify your
commits. (Access control is generally done by mechanisms external to the
DVCS, such as ssh or console login). This identity may be associated
with a full name.

All of the DVCSs will query the system to get some approximation to this
information, but that may not be what you want. They also support
setting this information on a per-user basis, and on a per-project
basis. Convenience commands to set these attributes vary, but all allow
direct editing of configuration files.

Some VCSs support end-of-line (EOL) conversions on checkout/checkin.

svn

None required, but it is recommended you follow the guidelines in the
dev FAQ.

bzr

No setup is required, but for much quicker and space-efficient local
branching, you should create a shared repository to hold all your Python
branches. A shared repository is really just a parent directory
containing a .bzr directory. When bzr commits a revision, it searches
from the local directory on up the file system for a .bzr directory to
hold the revision. By sharing revisions across multiple branches, you
cut down on the amount of disk space used. Do this:

    cd ~/projects
    bzr init-repo python
    cd python

Now, all your Python branches should be created inside of
~/projects/python.

There are also some settings you can put in your ~/.bzr/bazaar.conf and
~/.bzr/locations.conf file to set up defaults for interacting with
Python code. None of them are required, although some are recommended.
E.g. I would suggest gpg signing all commits, but that might be too high
a barrier for developers. Also, you can set up default push locations
depending on where you want to push branches by default. If you have
write access to the master branches, that push location could be
code.python.org. Otherwise, it might be a free Bazaar code hosting
service such as Launchpad. If Bazaar is chosen, we should decide what
the policies and recommendations are.

At a minimum, I would set up your email address:

    bzr whoami "Firstname Lastname <email.address@example.com>"

As with hg and git below, there are ways to set your email address (or
really, just about any parameter) on a per-repository basis. You do this
with settings in your $HOME/.bazaar/locations.conf file, which has an
ini-style format as does the other DVCSs. See the Bazaar documentation
for details, which mostly aren't relevant for this discussion.

hg

Minimally, you should set your user name. To do so, create the file
.hgrc in your home directory and add the following:

    [ui]
    username = Firstname Lastname <email.address@example.com>

If you are using Windows and your tools do not support Unix-style
newlines, you can enable automatic newline translation by adding to your
configuration:

    [extensions]
    win32text =

These options can also be set locally to a given repository by
customizing <repo>/.hg/hgrc, instead of ~/.hgrc.

git

None needed. However, git supports a number of features that can smooth
your work, with a little preparation. git supports setting defaults at
the workspace, user, and system levels. The system level is out of scope
of this PEP. The user configuration file is $HOME/.gitconfig on
Unix-like systems, and the workspace configuration file is
$REPOSITORY/.git/config.

You can use the git-config tool to set preferences for user.name and
user.email either globally (for your system login account) or locally
(to a given git working copy), or you can edit the configuration files
(which have the same format as shown in the Mercurial section above).:

    # my full name doesn't change
    # note "--global" flag means per user
    # (system-wide configuration is set with "--system")
    git config --global user.name 'Firstname Lastname'
    # but use my Pythonic email address
    cd /path/to/python/repository
    git config user.email email.address@python.example.com

If you are using Windows, you probably want to set the core.autocrlf and
core.safecrlf preferences to true using git-config.:

    # check out files with CRLF line endings rather than Unix-style LF only
    git config --global core.autocrlf true
    # scream if a transformation would be ambiguous
    # (eg, a working file contains both naked LF and CRLF)
    # and check them back in with the reverse transformation
    git config --global core.safecrlf true

Although the repository will usually contain a .gitignore file
specifying file names that rarely if ever should be registered in the
VCS, you may have personal conventions (e.g., always editing log
messages in a temporary file named ".msg") that you may wish to
specify.:

    # tell git where my personal ignores are
    git config --global core.excludesfile ~/.gitignore
    # I use .msg for my long commit logs, and Emacs makes backups in
    # files ending with ~
    # these are globs, not regular expressions
    echo '*~' >> ~/.gitignore
    echo '.msg' >> ~/.gitignore

If you use multiple branches, as with the other VCSes, you can save a
lot of space by putting all objects in a common object store. This also
can save download time, if the origins of the branches were in different
repositories, because objects are shared across branches in your
repository even if they were not present in the upstream repositories.
git is very space- and time-efficient and applies a number of
optimizations automatically, so this configuration is optional.
(Examples are omitted.)

One-Off Checkout

As a non-core developer, I want to create and publish a one-off patch
that fixes a bug, so that a core developer can review it for inclusion
in the mainline.

-   Checkout/branch/clone trunk.
-   Edit some code.
-   Generate a patch (based on what is best supported by the VCS, e.g.
    branch history).
-   Receive reviewer comments and address the issues.
-   Generate a second patch for the core developer to commit.

svn

    svn checkout http://svn.python.org/projects/python/trunk
    cd trunk
    # Edit some code.
    echo "The cake is a lie!" > README
    # Since svn lacks support for local commits, we fake it with patches.
    svn diff >> commit-1.diff
    svn diff >> patch-1.diff
    # Upload the patch-1 to bugs.python.org.
    # Receive reviewer comments.
    # Edit some code.
    echo "The cake is real!" > README
    # Since svn lacks support for local commits, we fake it with patches.
    svn diff >> commit-2.diff
    svn diff >> patch-2.diff
    # Upload patch-2 to bugs.python.org

bzr

    bzr branch http://code.python.org/python/trunk
    cd trunk
    # Edit some code.
    bzr commit -m 'Stuff I did'
    bzr send -o bundle
    # Upload bundle to bugs.python.org
    # Receive reviewer comments
    # Edit some code
    bzr commit -m 'Respond to reviewer comments'
    bzr send -o bundle
    # Upload updated bundle to bugs.python.org

The bundle file is like a super-patch. It can be read by patch(1) but it
contains additional metadata so that it can be fed to bzr merge to
produce a fully usable branch completely with history. See Patch Review
section below.

hg

    hg clone http://code.python.org/hg/trunk
    cd trunk
    # Edit some code.
    hg commit -m "Stuff I did"
    hg outgoing -p > fixes.patch
    # Upload patch to bugs.python.org
    # Receive reviewer comments
    # Edit some code
    hg commit -m "Address reviewer comments."
    hg outgoing -p > additional-fixes.patch
    # Upload patch to bugs.python.org

While hg outgoing does not have the flag for it, most Mercurial commands
support git's extended patch format through a --git command. This can be
set in one's .hgrc file so that all commands that generate a patch use
the extended format.

git

The patches could be created with git diff master > stuff-i-did.patch,
too, but git format-patch | git am knows some tricks (empty files,
renames, etc) that ordinary patch can't handle. git grabs "Stuff I did"
out of the commit message to create the file name
0001-Stuff-I-did.patch. See Patch Review below for a description of the
git-format-patch format. :

    # Get the mainline code.
    git clone git://code.python.org/python/trunk
    cd trunk
    # Edit some code.
    git commit -a -m 'Stuff I did.'
    # Create patch for my changes (i.e, relative to master).
    git format-patch master
    git tag stuff-v1
    # Upload 0001-Stuff-I-did.patch to bugs.python.org.
    # Time passes ... receive reviewer comments.
    # Edit more code.
    git commit -a -m 'Address reviewer comments.'
    # Make an add-on patch to apply on top of the original.
    git format-patch stuff-v1
    # Upload 0001-Address-reviewer-comments.patch to bugs.python.org.

Backing Out Changes

As a core developer, I want to undo a change that was not ready for
inclusion in the mainline.

-   Back out the unwanted change.
-   Push patch to server.

svn

    # Assume the change to revert is in revision 40
    svn merge -c -40 .
    # Resolve conflicts, if any.
    svn commit -m "Reverted revision 40"

bzr

    # Assume the change to revert is in revision 40
    bzr merge -r 40..39
    # Resolve conflicts, if any.
    bzr commit -m "Reverted revision 40"

Note that if the change you want revert is the last one that was made,
you can just use bzr uncommit.

hg

    # Assume the change to revert is in revision 9150dd9c6d30
    hg backout --merge -r 9150dd9c6d30
    # Resolve conflicts, if any.
    hg commit -m "Reverted changeset 9150dd9c6d30"
    hg push

Note, you can use "hg rollback" and "hg strip" to revert changes you
committed in your local repository, but did not yet push to other
repositories.

git

    # Assume the change to revert is the grandfather of a revision tagged "newhotness".
    git revert newhotness~2
    # Resolve conflicts if any.  If there are no conflicts, the commit
    # will be done automatically by "git revert", which prompts for a log.
    git commit -m "Reverted changeset 9150dd9c6d30."
    git push

Patch Review

As a core developer, I want to review patches submitted by other people,
so that I can make sure that only approved changes are added to Python.

Core developers have to review patches as submitted by other people.
This requires applying the patch, testing it, and then tossing away the
changes. The assumption can be made that a core developer already has a
checkout/branch/clone of the trunk.

-   Branch off of trunk.
-   Apply patch w/o any comments as generated by the patch submitter.
-   Push patch to server.
-   Delete now-useless branch.

svn

Subversion does not exactly fit into this development style very well as
there are no such thing as a "branch" as has been defined in this PEP.
Instead a developer either needs to create another checkout for testing
a patch or create a branch on the server. Up to this point, core
developers have not taken the "branch on the server" approach to dealing
with individual patches. For this scenario the assumption will be the
developer creates a local checkout of the trunk to work with.:

    cp -r trunk issue0000
    cd issue0000
    patch -p0 < __patch__
    # Review patch.
    svn commit -m "Some patch."
    cd ..
    rm -r issue0000

Another option is to only have a single checkout running at any one time
and use svn diff along with svn revert -R to store away independent
changes you may have made.

bzr

    bzr branch trunk issueNNNN
    # Download `patch` bundle from Roundup
    bzr merge patch
    # Review patch
    bzr commit -m'Patch NNN by So N. So' --fixes python:NNNN
    bzr push bzr+ssh://me@code.python.org/trunk
    rm -rf ../issueNNNN

Alternatively, since you're probably going to commit these changes to
the trunk, you could just do a checkout. That would give you a local
working tree while the branch (i.e. all revisions) would continue to
live on the server. This is similar to the svn model and might allow you
to more quickly review the patch. There's no need for the push in this
case.:

    bzr checkout trunk issueNNNN
    # Download `patch` bundle from Roundup
    bzr merge patch
    # Review patch
    bzr commit -m'Patch NNNN by So N. So' --fixes python:NNNN
    rm -rf ../issueNNNN

hg

    hg clone trunk issue0000
    cd issue0000
    # If the patch was generated using hg export, the user name of the
    # submitter is automatically recorded. Otherwise,
    # use hg import --no-commit submitted.diff and commit with
    # hg commit -u "Firstname Lastname <email.address@example.com>"
    hg import submitted.diff
    # Review patch.
    hg push ssh://alexandre@code.python.org/hg/trunk/

git

We assume a patch created by git-format-patch. This is a Unix mbox file
containing one or more patches, each formatted as an 2822 message.
git-am interprets each message as a commit as follows. The author of the
patch is taken from the From: header, the date from the Date header. The
commit log is created by concatenating the content of the subject line,
a blank line, and the message body up to the start of the patch.:

    cd trunk
    # Create a branch in case we don't like the patch.
    # This checkout takes zero time, since the workspace is left in
    # the same state as the master branch.
    git checkout -b patch-review
    # Download patch from bugs.python.org to submitted.patch.
    git am < submitted.patch
    # Review and approve patch.
    # Merge into master and push.
    git checkout master
    git merge patch-review
    git push

Backport

As a core developer, I want to apply a patch to 2.6, 2.7, 3.0, and 3.1
so that I can fix a problem in all three versions.

Thanks to always having the cutting-edge and the latest release version
under development, Python currently has four branches being worked on
simultaneously. That makes it important for a change to propagate easily
through various branches.

svn

Because of Python's use of svnmerge, changes start with the trunk (2.7)
and then get merged to the release version of 2.6. To get the change
into the 3.x series, the change is merged into 3.1, fixed up, and then
merged into 3.0 (2.7 -> 2.6; 2.7 -> 3.1 -> 3.0).

This is in contrast to a port-forward strategy where the patch would
have been added to 2.6 and then pulled forward into newer versions (2.6
-> 2.7 -> 3.0 -> 3.1).

    # Assume patch applied to 2.7 in revision 0000.
    cd release26-maint
    svnmerge merge -r 0000
    # Resolve merge conflicts and make sure patch works.
    svn commit -F svnmerge-commit-message.txt  # revision 0001.
    cd ../py3k
    svnmerge merge -r 0000
    # Same as for 2.6, except Misc/NEWS changes are reverted.
    svn revert Misc/NEWS
    svn commit -F svnmerge-commit-message.txt  # revision 0002.
    cd ../release30-maint
    svnmerge merge -r 0002
    svn commit -F svnmerge-commit-message.txt  # revision 0003.

bzr

Bazaar is pretty straightforward here, since it supports cherry picking
revisions manually. In the example below, we could have given a revision
id instead of a revision number, but that's usually not necessary.
Martin Pool suggests "We'd generally recommend doing the fix first in
the oldest supported branch, and then merging it forward to the later
releases.":

    # Assume patch applied to 2.7 in revision 0000
    cd release26-maint
    bzr merge ../trunk -c 0000
    # Resolve conflicts and make sure patch works
    bzr commit -m 'Back port patch NNNN'
    bzr push bzr+ssh://me@code.python.org/trunk
    cd ../py3k
    bzr merge ../trunk -r 0000
    # Same as for 2.6 except Misc/NEWS changes are reverted
    bzr revert Misc/NEWS
    bzr commit -m 'Forward port patch NNNN'
    bzr push bzr+ssh://me@code.python.org/py3k

hg

Mercurial, like other DVCS, does not well support the current workflow
used by Python core developers to backport patches. Right now, bug fixes
are first applied to the development mainline (i.e., trunk), then
back-ported to the maintenance branches and forward-ported, as
necessary, to the py3k branch. This workflow requires the ability to
cherry-pick individual changes. Mercurial's transplant extension
provides this ability. Here is an example of the scenario using this
workflow:

    cd release26-maint
    # Assume patch applied to 2.7 in revision 0000
    hg transplant -s ../trunk 0000
    # Resolve conflicts, if any.
    cd ../py3k
    hg pull ../trunk
    hg merge
    hg revert Misc/NEWS
    hg commit -m "Merged trunk"
    hg push

In the above example, transplant acts much like the current svnmerge
command. When transplant is invoked without the revision, the command
launches an interactive loop useful for transplanting multiple changes.
Another useful feature is the --filter option which can be used to
modify changesets programmatically (e.g., it could be used for removing
changes to Misc/NEWS automatically).

Alternatively to the traditional workflow, we could avoid transplanting
changesets by committing bug fixes to the oldest supported release, then
merge these fixes upward to the more recent branches. :

    cd release25-maint
    hg import fix_some_bug.diff
    # Review patch and run test suite. Revert if failure.
    hg push
    cd ../release26-maint
    hg pull ../release25-maint
    hg merge
    # Resolve conflicts, if any. Then, review patch and run test suite.
    hg commit -m "Merged patches from release25-maint."
    hg push
    cd ../trunk
    hg pull ../release26-maint
    hg merge
    # Resolve conflicts, if any, then review.
    hg commit -m "Merged patches from release26-maint."
    hg push

Although this approach makes the history non-linear and slightly more
difficult to follow, it encourages fixing bugs across all supported
releases. Furthermore, it scales better when there is many changes to
backport, because we do not need to seek the specific revision IDs to
merge.

git

In git I would have a workspace which contains all of the relevant
master repository branches. git cherry-pick doesn't work across
repositories; you need to have the branches in the same repository. :

    # Assume patch applied to 2.7 in revision release27~3 (4th patch back from tip).
    cd integration
    git checkout release26
    git cherry-pick release27~3
    # If there are conflicts, resolve them, and commit those changes.
    # git commit -a -m "Resolve conflicts."
    # Run test suite. If fixes are necessary, record as a separate commit.
    # git commit -a -m "Fix code causing test failures."
    git checkout master
    git cherry-pick release27~3
    # Do any conflict resolution and test failure fixups.
    # Revert Misc/NEWS changes.
    git checkout HEAD^ -- Misc/NEWS
    git commit -m 'Revert cherry-picked Misc/NEWS changes.' Misc/NEWS
    # Push both ports.
    git push release26 master

If you are regularly merging (rather than cherry-picking) from a given
branch, then you can block a given commit from being accidentally merged
in the future by merging, then reverting it. This does not prevent a
cherry-pick from pulling in the unwanted patch, and this technique
requires blocking everything that you don't want merged. I'm not sure if
this differs from svn on this point. :

    cd trunk
    # Merge in the alpha tested code.
    git merge experimental-branch
    # We don't want the 3rd-to-last commit from the experimental-branch,
    # and we don't want it to ever be merged.
    # The notation "^N" means Nth parent of the current commit. Thus HEAD^2^1^1
    # means the first parent of the first parent of the second parent of HEAD.
    git revert HEAD^2^1^1
    # Propagate the merge and the prohibition to the public repository.
    git push

Coordinated Development of a New Feature

Sometimes core developers end up working on a major feature with several
developers. As a core developer, I want to be able to publish feature
branches to a common public location so that I can collaborate with
other developers.

This requires creating a branch on a server that other developers can
access. All of the DVCSs support creating new repositories on hosts
where the developer is already able to commit, with appropriate
configuration of the repository host. This is similar in concept to the
existing sandbox in svn, although details of repository initialization
may differ.

For non-core developers, there are various more-or-less public-access
repository-hosting services. Bazaar has Launchpad, Mercurial has
bitbucket.org, and git has GitHub. All also have easy-to-use CGI
interfaces for developers who maintain their own servers.

-   Branch trunk.
-   Pull from branch on the server.
-   Pull from trunk.
-   Push merge to trunk.

svn

    # Create branch.
    svn copy svn+ssh://pythondev@svn.python.org/python/trunk svn+ssh://pythondev@svn.python.org/python/branches/NewHotness
    svn checkout svn+ssh://pythondev@svn.python.org/python/branches/NewHotness
    cd NewHotness
    svnmerge init
    svn commit -m "Initialize svnmerge."
    # Pull in changes from other developers.
    svn update
    # Pull in trunk and merge to the branch.
    svnmerge merge
    svn commit -F svnmerge-commit-message.txt

This scenario is incomplete as the decision for what DVCS to go with was
made before the work was complete.

Separation of Issue Dependencies

Sometimes, while working on an issue, it becomes apparent that the
problem being worked on is actually a compound issue of various smaller
issues. Being able to take the current work and then begin working on a
separate issue is very helpful to separate out issues into individual
units of work instead of compounding them into a single, large unit.

-   Create a branch A (e.g. urllib has a bug).
-   Edit some code.
-   Create a new branch B that branch A depends on (e.g. the urllib bug
    exposes a socket bug).
-   Edit some code in branch B.
-   Commit branch B.
-   Edit some code in branch A.
-   Commit branch A.
-   Clean up.

svn

To make up for svn's lack of cheap branching, it has a changelist option
to associate a file with a single changelist. This is not as powerful as
being able to associate at the commit level. There is also no way to
express dependencies between changelists. :

    cp -r trunk issue0000
    cd issue0000
    # Edit some code.
    echo "The cake is a lie!" > README
    svn changelist A README
    # Edit some other code.
    echo "I own Python!" > LICENSE
    svn changelist B LICENSE
    svn ci -m "Tell it how it is." --changelist B
    # Edit changelist A some more.
    svn ci -m "Speak the truth." --changelist A
    cd ..
    rm -rf issue0000

bzr

Here's an approach that uses bzr shelf (now a standard part of bzr) to
squirrel away some changes temporarily while you take a detour to fix
the socket bugs. :

    bzr branch trunk bug-0000
    cd bug-0000
    # Edit some code. Dang, we need to fix the socket module.
    bzr shelve --all
    # Edit some code.
    bzr commit -m "Socket module fixes"
    # Detour over, now resume fixing urllib
    bzr unshelve
    # Edit some code

Another approach uses the loom plugin. Looms can greatly simplify
working on dependent branches because they automatically take care of
the stacking dependencies for you. Imagine looms as a stack of dependent
branches (called "threads" in loom parlance), with easy ways to move up
and down the stack of threads, merge changes up the stack to descendant
threads, create diffs between threads, etc. Occasionally, you may need
or want to export your loom threads into separate branches, either for
review or commit. Higher threads incorporate all the changes in the
lower threads, automatically. :

    bzr branch trunk bug-0000
    cd bug-0000
    bzr loomify --base trunk
    bzr create-thread fix-urllib
    # Edit some code. Dang, we need to fix the socket module first.
    bzr commit -m "Checkpointing my work so far"
    bzr down-thread
    bzr create-thread fix-socket
    # Edit some code
    bzr commit -m "Socket module fixes"
    bzr up-thread
    # Manually resolve conflicts if necessary
    bzr commit -m 'Merge in socket fixes'
    # Edit me some more code
    bzr commit -m "Now that socket is fixed, complete the urllib fixes"
    bzr record done

For bonus points, let's say someone else fixes the socket module in
exactly the same way you just did. Perhaps this person even grabbed your
fix-socket thread and applied just that to the trunk. You'd like to be
able to merge their changes into your loom and delete your now-redundant
fix-socket thread. :

    bzr down-thread trunk
    # Get all new revisions to the trunk. If you've done things
    # correctly, this will succeed without conflict.
    bzr pull
    bzr up-thread
    # See? The fix-socket thread is now identical to the trunk
    bzr commit -m 'Merge in trunk changes'
    bzr diff -r thread: | wc -l # returns 0
    bzr combine-thread
    bzr up-thread
    # Resolve any conflicts
    bzr commit -m 'Merge trunk'
    # Now our top-thread has an up-to-date trunk and just the urllib fix.

hg

One approach is to use the shelve extension; this extension is not
included with Mercurial, but it is easy to install. With shelve, you can
select changes to put temporarily aside. :

    hg clone trunk issue0000
    cd issue0000
    # Edit some code (e.g. urllib).
    hg shelve
    # Select changes to put aside
    # Edit some other code (e.g. socket).
    hg commit
    hg unshelve
    # Complete initial fix.
    hg commit
    cd ../trunk
    hg pull ../issue0000
    hg merge
    hg commit
    rm -rf ../issue0000

Several other way to approach this scenario with Mercurial. Alexander
Solovyov presented a few alternative approaches on Mercurial's mailing
list.

git

    cd trunk
    # Edit some code in urllib.
    # Discover a bug in socket, want to fix that first.
    # So save away our current work.
    git stash
    # Edit some code, commit some changes.
    git commit -a -m "Completed fix of socket."
    # Restore the in-progress work on urllib.
    git stash apply
    # Edit me some more code, commit some more fixes.
    git commit -a -m "Complete urllib fixes."
    # And push both patches to the public repository.
    git push

Bonus points: suppose you took your time, and someone else fixes socket
in the same way you just did, and landed that in the trunk. In that
case, your push will fail because your branch is not up-to-date. If the
fix was a one-liner, there's a very good chance that it's exactly the
same, character for character. git would notice that, and you are done;
git will silently merge them.

Suppose we're not so lucky:

    # Update your branch.
    git pull git://code.python.org/public/trunk master

    # git has fetched all the necessary data, but reports that the
    # merge failed.  We discover the nearly-duplicated patch.
    # Neither our version of the master branch nor the workspace has
    # been touched.  Revert our socket patch and pull again:
    git revert HEAD^
    git pull git://code.python.org/public/trunk master

Like Bazaar and Mercurial, git has extensions to manage stacks of
patches. You can use the original Quilt by Andrew Morton, or there is
StGit ("stacked git") which integrates patch-tracking for large sets of
patches into the VCS in a way similar to Mercurial Queues or Bazaar
looms.

Doing a Python Release

How does PEP 101 change when using a DVCS?

bzr

It will change, but not substantially so. When doing the maintenance
branch, we'll just push to the new location instead of doing an svn cp.
Tags are totally different, since in svn they are directory copies, but
in bzr (and I'm guessing hg), they are just symbolic names for revisions
on a particular branch. The release.py script will have to change to use
bzr commands instead. It's possible that because DVCS (in particular,
bzr) does cherry picking and merging well enough that we'll be able to
create the maint branches sooner. It would be a useful exercise to try
to do a release off the bzr/hg mirrors.

hg

Clearly, details specific to Subversion in PEP 101 and in the release
script will need to be updated. In particular, release tagging and
maintenance branches creation process will have to be modified to use
Mercurial's features; this will simplify and streamline certain aspects
of the release process. For example, tagging and re-tagging a release
will become a trivial operation since a tag, in Mercurial, is simply a
symbolic name for a given revision.

git

It will change, but not substantially so. When doing the maintenance
branch, we'll just git push to the new location instead of doing an svn
cp. Tags are totally different, since in svn they are directory copies,
but in git they are just symbolic names for revisions, as are branches.
(The difference between a tag and a branch is that tags refer to a
particular commit, and will never change unless you use git tag -f to
force them to move. The checked-out branch, on the other hand, is
automatically updated by git commit.) The release.py script will have to
change to use git commands instead. With git I would create a (local)
maintenance branch as soon as the release engineer is chosen. Then I'd
"git pull" until I didn't like a patch, when it would be "git pull; git
revert ugly-patch", until it started to look like the sensible thing is
to fork off, and start doing "git cherry-pick" on the good patches.

Platform/Tool Support

Operating Systems

  DVCS   Windows                                   OS X                                            UNIX
  ------ ----------------------------------------- ----------------------------------------------- -------------------------------
  bzr    yes (installer) w/ tortoise               yes (installer, fink or MacPorts)               yes (various package formats)
  hg     yes (third-party installer) w/ tortoise   yes (third-party installer, fink or MacPorts)   yes (various package formats)
  git    yes (third-party installer)               yes (third-party installer, fink or MacPorts)   yes (.deb or .rpm)

As the above table shows, all three DVCSs are available on all three
major OS platforms. But what it also shows is that Bazaar is the only
DVCS that directly supports Windows with a binary installer while
Mercurial and git require you to rely on a third-party for binaries.
Both bzr and hg have a tortoise version while git does not.

Bazaar and Mercurial also has the benefit of being available in pure
Python with optional extensions available for performance.

CRLF -> LF Support

bzr

    My understanding is that support for this is being worked on as I
    type, landing in a version RSN. I will try to dig up details.

hg

    Supported via the win32text extension.

git

    I can't say from personal experience, but it looks like there's
    pretty good support via the core.autocrlf and core.safecrlf
    configuration attributes.

Case-insensitive filesystem support

bzr

    Should be OK. I share branches between Linux and OS X all the time.
    I've done case changes (e.g. bzr mv Mailman mailman) and as long as
    I did it on Linux (obviously), when I pulled in the changes on OS X
    everything was hunky dory.

hg

    Mercurial uses a case safe repository mechanism and detects case
    folding collisions.

git

    Since OS X preserves case, you can do case changes there too. git
    does not have a problem with renames in either direction. However,
    case-insensitive filesystem support is usually taken to mean
    complaining about collisions on case-sensitive files systems. git
    does not do that.

Tools

In terms of code review tools such as Review Board and Rietveld, the
former supports all three while the latter supports hg and git but not
bzr. Bazaar does not yet have an online review board, but it has several
ways to manage email based reviews and trunk merging. There's Bundle
Buggy, Patch Queue Manager (PQM), and Launchpad's code reviews.

All three have some web site online that provides basic hosting support
for people who want to put a repository online. Bazaar has Launchpad,
Mercurial has bitbucket.org, and git has GitHub. Google Code also has
instructions on how to use git with the service, both to hold a
repository and how to act as a read-only mirror.

All three also appear to be supported by Buildbot.

Usage On Top Of Subversion

  DVCS   svn support
  ------ ------------------------
  bzr    bzr-svn (third-party)
  hg     multiple third-parties
  git    git-svn

All three DVCSs have svn support, although git is the only one to come
with that support out-of-the-box.

Server Support

  DVCS   Web page interface
  ------ --------------------
  bzr    loggerhead
  hg     hgweb
  git    gitweb

All three DVCSs support various hooks on the client and server side for
e.g. pre/post-commit verifications.

Development

All three projects are under active development. Git seems to be on a
monthly release schedule. Bazaar is on a time-released monthly schedule.
Mercurial is on a 4-month, timed release schedule.

Special Features

bzr

Martin Pool adds: "bzr has a stable Python scripting interface, with a
distinction between public and private interfaces and a deprecation
window for APIs that are changing. Some plugins are listed in
https://edge.launchpad.net/bazaar and
http://bazaar-vcs.org/Documentation".

hg

Alexander Solovyov comments:

  Mercurial has easy to use extensive API with hooks for main events and
  ability to extend commands. Also there is the mq (mercurial queues)
  extension, distributed with Mercurial, which simplifies work with
  patches.

git

git has a cvsserver mode, ie, you can check out a tree from git using
CVS. You can even commit to the tree, but features like merging are
absent, and branches are handled as CVS modules, which is likely to
shock a veteran CVS user.

Tests/Impressions

As I (Brett Cannon) am left with the task of making the final decision
of which/any DVCS to go with and not my co-authors, I felt it only fair
to write down what tests I ran and my impressions as I evaluate the
various tools so as to be as transparent as possible.

Barrier to Entry

The amount of time and effort it takes to get a checkout of Python's
repository is critical. If the difficulty or time is too great then a
person wishing to contribute to Python may very well give up. That
cannot be allowed to happen.

I measured the checking out of the 2.x trunk as if I was a non-core
developer. Timings were done using the time command in zsh and space was
calculated with du -c -h.

+------+---------------+-----------+-------+
| DVCS | San Francisco | Vancouver | Space |
+======+===============+===========+=======+
| svn  |   1:04        |   2:59    | 139 M |
+------+---------------+-----------+-------+
| bzr  |   10:45       | 16:04     | 276 M |
+------+---------------+-----------+-------+
| hg   |   2:30        |   5:24    | 171 M |
+------+---------------+-----------+-------+
| git  |   2:54        |   5:28    | 134 M |
+------+---------------+-----------+-------+

When comparing these numbers to svn, it is important to realize that it
is not a 1:1 comparison. Svn does not pull down the entire revision
history like all of the DVCSs do. That means svn can perform an initial
checkout much faster than the DVCS purely based on the fact that it has
less information to download for the network.

Performance of basic information functionality

To see how the tools did for performing a command that required querying
the history, the log for the README file was timed.

  DVCS   Time
  ------ -------
  bzr    4.5 s
  hg     1.1 s
  git    1.5 s

One thing of note during this test was that git took longer than the
other three tools to figure out how to get the log without it using a
pager. While the pager use is a nice touch in general, not having it
automatically turn on took some time (turns out the main git command has
a --no-pager flag to disable use of the pager).

Figuring out what command to use from built-in help

I ended up trying to find out what the command was to see what URL the
repository was cloned from. To do this I used nothing more than the help
provided by the tool itself or its man pages.

Bzr was the easiest: bzr info. Running bzr help didn't show what I
wanted, but mentioned bzr help commands. That list had the command with
a description that made sense.

Git was the second easiest. The command git help didn't show much and
did not have a way of listing all commands. That is when I viewed the
man page. Reading through the various commands I discovered git remote.
The command itself spit out nothing more than origin. Trying
git remote origin said it was an error and printed out the command
usage. That is when I noticed git remote show. Running
git remote show origin gave me the information I wanted.

For hg, I never found the information I wanted on my own. It turns out I
wanted hg paths, but that was not obvious from the description of "show
definition of symbolic path names" as printed by hg help (it should be
noted that reporting this in the PEP did lead to the Mercurial
developers to clarify the wording to make the use of the hg paths
command clearer).

Updating a checkout

To see how long it takes to update an outdated repository I timed both
updating a repository 700 commits behind and 50 commits behind (three
weeks stale and 1 week stale, respectively).

  DVCS   700 commits   50 commits
  ------ ------------- ------------
  bzr    39 s          7 s
  hg     17 s          3 s
  git    N/A           4 s

Note

Git lacks a value for the 700 commits scenario as it does not seem to
allow checking out a repository at a specific revision.

Git deserves special mention for its output from git pull. It not only
lists the delta change information for each file but also color-codes
the information.

Decision

At PyCon 2009 the decision was made to go with Mercurial.

Why Mercurial over Subversion

While svn has served the development team well, it needs to be admitted
that svn does not serve the needs of non-committers as well as a DVCS
does. Because svn only provides its features such as version control,
branching, etc. to people with commit privileges on the repository it
can be a hindrance for people who lack commit privileges. But DVCSs have
no such limitation as anyone can create a local branch of Python and
perform their own local commits without the burden that comes with
cloning the entire svn repository. Allowing anyone to have the same
workflow as the core developers was the key reason to switch from svn to
hg.

Orthogonal to the benefits of allowing anyone to easily commit locally
to their own branches is offline, fast operations. Because hg stores all
data locally there is no need to send requests to a server remotely and
instead work off of the local disk. This improves response times
tremendously. It also allows for offline usage for when one lacks an
Internet connection. But this benefit is minor and considered simply a
side-effect benefit instead of a driving factor for switching off of
Subversion.

Why Mercurial over other DVCSs

Git was not chosen for three key reasons (see the PyCon 2009 lightning
talk where Brett Cannon lists these exact reasons; talk started at
3:45). First, git's Windows support is the weakest out of the three
DVCSs being considered which is unacceptable as Python needs to support
development on any platform it runs on. Since Python runs on Windows and
some people do develop on the platform it needs solid support. And while
git's support is improving, as of this moment it is the weakest by a
large enough margin to warrant considering it a problem.

Second, and just as important as the first issue, is that the Python
core developers liked git the least out of the three DVCS options by a
wide margin. If you look at the following table you will see the results
of a survey taken of the core developers and how by a large margin git
is the least favorite version control system.

  DVCS   ++   equal   --   Uninformed
  ------ ---- ------- ---- ------------
  git    5    1       8    13
  bzr    10   3       2    12
  hg     15   1       1    10

Lastly, all things being equal (which they are not as shown by the
previous two issues), it is preferable to use and support a tool written
in Python and not one written in C and shell. We are pragmatic enough to
not choose a tool simply because it is written in Python, but we do see
the usefulness in promoting tools that do use it when it is reasonable
to do so as it is in this case.

As for why Mercurial was chosen over Bazaar, it came down to popularity.
As the core developer survey shows, hg was preferred over bzr. But the
community also appears to prefer hg as was shown at PyCon after git's
removal from consideration was announced. Many people came up to Brett
and said in various ways that they wanted hg to be chosen. While no one
said they did not want bzr chosen, no one said they did either.

Based on all of this information, Guido and Brett decided Mercurial was
to be the next version control system for Python.

Transition Plan

PEP 385 outlines the transition from svn to hg.

Copyright

This document has been placed in the public domain.