PEP: 347 Title: Migrating the Python CVS to Subversion Version:
$Revision$ Last-Modified: $Date$ Author: Martin von Löwis
<martin@v.loewis.de> Discussions-To: python-dev@python.org Status: Final
Type: Process Content-Type: text/x-rst Created: 14-Jul-2004
Post-History: 14-Jul-2004

Abstract

The Python source code is currently managed in a CVS repository on
sourceforge.net. This PEP proposes to move it to a Subversion repository
on svn.python.org.

Rationale

This change has two aspects: moving from CVS to Subversion, and moving
from SourceForge to python.org. For each, a rationale will be given.

Moving to Subversion

CVS has a number of limitations that have been eliminated by Subversion.
For the development of Python, the most notable improvements are:

-   the ability to rename files and directories, and to remove
    directories, while keeping the history of these files.
-   support for change sets (sets of correlated changes to multiple
    files) through global revision numbers. Change sets are
    transactional.
-   atomic, fast tagging: a cvs tag might take many minutes; a
    Subversion tag (svn cp) will complete quickly, and atomically.
    Likewise, branches are very efficient.
-   support for offline diffs, which is useful when creating patches.

Moving to python.org

SourceForge has kindly provided an important infrastructure for the past
years. Unfortunately, the attention that SF received has also caused
repeated overload situations in the past, to which the SF operators
could not always respond in a timely manner. In particular, for CVS,
they had to reduce the load on the primary CVS server by introducing a
second, read-only CVS server for anonymous access. This server is
regularly synchronized, but lags behind the read-write CVS repository
between synchronizations. As a result, users without commit access can
see recent changes to the repository only after a delay.

On python.org, it would be possible to make the repository accessible
for anonymous access.

Migration Procedure

To move the Python CVS repository, the following steps need to be
executed. The steps are elaborated upon in the following sections.

1.  Collect SSH keys for all current committers, along with usernames to
    appear in commit messages.
2.  At the beginning of the migration, announce that the repository on
    SourceForge closed.
3.  24 hours after the last commit, download the CVS repository.
4.  Convert the CVS repository into a Subversion repository.
5.  Publish the repository with write access for committers, and
    read-only anonymous access.
6.  Disable CVS access on SF.

Collect SSH keys

After some discussion, svn+ssh was selected as the best method for write
access to the repository. Developers can continue to use their SSH keys,
but they must be installed on python.org.

In order to avoid having to create a new Unix user for each developer, a
single account should be used, with command= attributes in the
authorized_keys files.

The lines in the authorized_keys file should read like this (wrapped for
better readability):

    command="/usr/bin/svnserve --root=/svnroot -t
    --tunnel-user='<username>'",no-port-forwarding,
    no-X11-forwarding,no-agent-forwarding,no-pty
    ssh-dss <key> <comment>

As the usernames, the real names should be used instead of the SF
account names, so that people can be better identified in log messages.

Administrator Access

Administrator access to the pythondev account should be granted to all
current admins of the Python SF project. To distinguish between shell
login and svnserve login, admins need to maintain two keys. Using
OpenSSH, the following procedure can be used to create a second key:

    cd .ssh
    ssh-keygen -t DSA -f pythondev -C <user>@pythondev
    vi config

In the config file, the following lines need to be added:

    Host pythondev
      Hostname dinsdale.python.org
      User pythondev
      IdentityFile ~/.ssh/pythondev

Then, shell login becomes possible through "ssh pythondev".

Downloading the CVS Repository

The CVS repository can be downloaded from

  http://cvs.sourceforge.net/cvstarballs/python-cvsroot.tar.bz2

Since this tarball is generated only once a day, some time must pass
after the repository freeze before the tarball can be picked up. It
should be verified that the last commit, as recorded on the
python-commits mailing list, is indeed included in the tarball.

After the conversion, the converted CVS tarball should be kept forever
on www.python.org/archive/python-cvsroot-<date>.tar.bz2

Converting the CVS Repository

The Python CVS repository contains two modules: distutils and python.
The python module is further structured into dist and nondist, where
dist only contains src (the python code proper). nondist contains
various subdirectories.

These should be reorganized in the Subversion repository to get shorter
URLs, following the <project>/{trunk,tags,branches} structure. A project
will be created for each nondist directory, plus for src (called
python), plus distutils. Reorganizing the repository is best done in the
CVS tree, as shown below.

The fsfs backend should be used as the repository format (which requires
Subversion 1.1). The fsfs backend has the advantage of being more
backup-friendly, as it allows incremental repository backups, without
requiring any dump commands to be run.

The conversion should be done using the cvs2svn utility, available e.g.
in the cvs2svn Debian package. As cvs2svn does not currently support the
project/trunk structure, each project needs to be converted separately.
To get each conversion result into a separate directory in the target
repository, svnadmin load must be used.

Subversion has a different view on binary-vs-text files than CVS. To
correctly carry the CVS semantics forward, svn:eol-style should be set
to native on all files that are not marked binary in the CVS.

In summary, the conversion script is:

    #!/bin/sh
    rm cvs2svn-*
    rm -rf python py.new
    tar xjf python-cvsroot.tar.bz2
    rm -rf python/CVSROOT
    svnadmin create --fs-type fsfs py.new
    mv python/python python/orig
    mv python/orig/dist/src python/python
    mv python/orig/nondist/* python
    # nondist/nondist is empty
    rmdir python/nondist
    rm -rf python/orig
    for a in python/*
    do
      b=`basename $a`
      cvs2svn -q --dump-only --encoding=latin1 --force-branch=cnri-16-start \
      --force-branch=descr-branch --force-branch=release152p1-patches \
      --force-tag=r16b1 $a
      svn mkdir -m"Conversion to SVN" file:///`pwd`/py.new/$b
      svnadmin load -q --parent-dir $b py.new < cvs2svn-dump
      rm cvs2svn-dump
    done

Sample results of this conversion are available at

  http://www.dcl.hpi.uni-potsdam.de/pysvn/

Publish the Repository

The repository should be published at http://svn.python.org/projects.
Read-write access should be granted to all current SF committers through
svn+ssh://pythondev@svn.python.org/; read-only anonymous access through
WebDAV should also be granted.

As an option, websvn (available e.g. from the Debian websvn package)
could be provided. Unfortunately, in the test installation, websvn
breaks because it runs out of memory.

The current SF project admins should get write access to the
authorized_keys2 file of the pythondev account.

Disable CVS

It appears that CVS cannot be disabled entirely. Only the user interface
can be removed from the project page; the repository itself remains
available. If desired, write access to the python and distutils modules
can be disabled through a CVS commitinfo entry.

Discussion

Several alternatives had been suggested to the procedure above. The
rejected alternatives are shortly discussed here:

-   create multiple repositories, one for python and one for distutils.
    This would have allowed even shorter URLs, but was rejected because
    a single repository supports moving code across projects.
-   Several people suggested to create the project/trunk structure
    through standard cvs2svn, followed by renames. This would have the
    disadvantage that old revisions use different path names than recent
    revisions; the suggested approach through dump files works without
    renames.
-   Several people also expressed concern about the administrative
    overhead that hosting the repository on python.org would cause to
    pydotorg admins. As a specific alternative, BerliOS has been
    suggested. The pydotorg admins themselves haven't objected to the
    additional workload; migrating the repository again if they get
    overworked is an option.
-   Different authentication strategies were discussed. As alternatives
    to svn+ssh were suggested
    -   Subversion over WebDAV, using SSL and basic authentication, with
        pydotorg-generated passwords mailed to the user. People did not
        like that approach, since they would need to store the password
        on disk (because they can't remember it); this is a security
        risk.
    -   Subversion over WebDAV, using SSL client certificates. This
        would work, but would require us to administer a certificate
        authority.
-   Instead of hosting this on python.org, people suggested hosting it
    elsewhere. One issue is whether this alternative should be free or
    commercial; several people suggested it should better be commercial,
    to reduce the load on the volunteers. In particular:
    -   Greg Stein suggested http://www.wush.net/subversion.php. They
        offer 5 GB for $90/month, with 200 GB download/month. The data
        is on a RAID drive and fully backed up. Anonymous access and
        email commit notifications are supported. wush.net elaborated
        the following details:

        -   The machine would be a Virtuozzo Virtual Private Server
            (VPS), hosted at PowerVPS.
        -   The default repository URL would be
            http://python.wush.net/svn/projectname/, but anything else
            could be arranged
        -   we would get SSH login to the machine, with sudo
            capabilities.
        -   They have a Web interface for management of the various SVN
            repositories that we want to host, and to manage user
            accounts. While svn+ssh would be supported, the user
            interface does not yet support it.
        -   For offsite mirroring/backup, they suggest to use rsync
            instead of download of repository tarballs.

        Bob Ippolito reported that they had used wush.net for a
        commercial project for about 6 months, after which time they
        left wush.net, because the service was down for three days, with
        nobody reachable, and no explanation when it came back.

Copyright

This document has been placed in the public domain.