PEP: 507 Title: Migrate CPython to Git and GitLab Author: Barry Warsaw
<barry@python.org> Status: Rejected Type: Process Content-Type:
text/x-rst Created: 30-Sep-2015 Post-History: Resolution:
https://mail.python.org/pipermail/core-workflow/2016-January/000345.html

Abstract

This PEP proposes migrating the repository hosting of CPython and the
supporting repositories to Git. Further, it proposes adopting a hosted
GitLab instance as the primary way of handling merge requests, code
reviews, and code hosting. It is similar in intent to PEP 481 but
proposes an open source alternative to GitHub and omits the proposal to
run Phabricator. As with PEP 481, this particular PEP is offered as an
alternative to PEP 474 and PEP 462.

Rationale

CPython is an open source project which relies on a number of volunteers
donating their time. As with any healthy, vibrant open source project,
it relies on attracting new volunteers as well as retaining existing
developers. Given that volunteer time is the most scarce resource,
providing a process that maximizes the efficiency of contributors and
reduces the friction for contributions, is of vital importance for the
long-term health of the project.

The current tool chain of the CPython project is a custom and unique
combination of tools. This has two critical implications:

-   The unique nature of the tool chain means that contributors must
    remember or relearn, the process, workflow, and tools whenever they
    contribute to CPython, without the advantage of leveraging long-term
    memory and familiarity they retain by working with other projects in
    the FLOSS ecosystem. The knowledge they gain in working with CPython
    is unlikely to be applicable to other projects.
-   The burden on the Python/PSF infrastructure team is much greater in
    order to continue to maintain custom tools, improve them over time,
    fix bugs, address security issues, and more generally adapt to new
    standards in online software development with global collaboration.

These limitations act as a barrier to contribution both for highly
engaged contributors (e.g. core Python developers) and especially for
more casual "drive-by" contributors, who care more about getting their
bug fix than learning a new suite of tools and workflows.

By proposing the adoption of both a different version control system and
a modern, well-maintained hosting solution, this PEP addresses these
limitations. It aims to enable a modern, well-understood process that
will carry CPython development for many years.

Version Control System

Currently the CPython and supporting repositories use Mercurial. As a
modern distributed version control system, it has served us well since
the migration from Subversion. However, when evaluating the VCS we must
consider the capabilities of the VCS itself as well as the network
effect and mindshare of the community around that VCS.

There are really only two real options for this, Mercurial and Git. The
technical capabilities of the two systems are largely equivalent,
therefore this PEP instead focuses on their social aspects.

It is not possible to get exact numbers for the number of projects or
people which are using a particular VCS, however we can infer this by
looking at several sources of information for what VCS projects are
using.

The Open Hub (previously Ohloh) statistics[1] show that 37% of the
repositories indexed by The Open Hub are using Git (second only to
Subversion which has 48%) while Mercurial has just 2%, beating only
Bazaar which has 1%. This has Git being just over 18 times as popular as
Mercurial on The Open Hub.

Another source of information on VCS popularity is PyPI itself. This
source is more targeted at the Python community itself since it
represents projects developed for Python. Unfortunately PyPI does not
have a standard location for representing this information, so this
requires manual processing. If we limit our search to the top 100
projects on PyPI (ordered by download counts) we can see that 62% of
them use Git, while 22% of them use Mercurial, and 13% use something
else. This has Git being just under 3 times as popular as Mercurial for
the top 100 projects on PyPI.

These numbers back up the anecdotal evidence for Git as the far more
popular DVCS for open source projects. Choosing the more popular VCS has
a number of positive benefits.

For new contributors it increases the likelihood that they will have
already learned the basics of Git as part of working with another
project or if they are just now learning Git, that they'll be able to
take that knowledge and apply it to other projects. Additionally a
larger community means more people writing how to guides, answering
questions, and writing articles about Git which makes it easier for a
new user to find answers and information about the tool they are trying
to learn and use. Given its popularity, there may also be more auxiliary
tooling written around Git. This increases options for everything from
GUI clients, helper scripts, repository hosting, etc.

Further, the adoption of Git as the proposed back-end repository format
doesn't prohibit the use of Mercurial by fans of that VCS! Mercurial
users have the[2] plugin which allows them to push and pull from a Git
server using the Mercurial front-end. It's a well-maintained and highly
functional plugin that seems to be well-liked by Mercurial users.

Repository Hosting

Where and how the official repositories for CPython are hosted is in
someways determined by the choice of VCS. With Git there are several
options. In fact, once the repository is hosted in Git, branches can be
mirrored in many locations, within many free, open, and proprietary code
hosting sites.

It's still important for CPython to adopt a single, official repository,
with a web front-end that allows for many convenient and common
interactions entirely through the web, without always requiring local
VCS manipulations. These interactions include as a minimum, code review
with inline comments, branch diffing, CI integration, and auto-merging.

This PEP proposes to adopt a[3] instance, run within the python.org
domain, accessible to and with ultimate control from the PSF and the
Python infrastructure team, but donated, hosted, and primarily
maintained by GitLab, Inc.

Why GitLab? Because it is a fully functional Git hosting system, that
sports modern web interactions, software workflows, and CI integration.
GitLab's Community Edition (CE) is open source software, and thus is
closely aligned with the principles of the CPython community.

Code Review

Currently CPython uses a custom fork of Rietveld modified to not run on
Google App Engine and which is currently only really maintained by one
person. It is missing common features present in many modern code review
tools.

This PEP proposes to utilize GitLab's built-in merge requests and online
code review features to facilitate reviews of all proposed changes.

GitLab merge requests

The normal workflow for a GitLab hosted project is to submit a merge
request asking that a feature or bug fix branch be merged into a target
branch, usually one or more of the stable maintenance branches or the
next-version master branch for new features. GitLab's merge requests are
similar in form and function to GitHub's pull requests, so anybody who
is already familiar with the latter should be able to immediately
utilize the former.

Once submitted, a conversation about the change can be had between the
submitter and reviewer. This includes both general comments, and inline
comments attached to a particular line of the diff between the source
and target branches. Projects can also be configured to automatically
run continuous integration on the submitted branch, the results of which
are readily visible from the merge request page. Thus both the reviewer
and submitter can immediately see the results of the tests, making it
much easier to only land branches with passing tests. Each new push to
the source branch (e.g. to respond to a commenter's feedback or to fix a
failing test) results in a new run of the CI, so that the state of the
request always reflects the latest commit.

Merge requests have a fairly major advantage over the older "submit a
patch to a bug tracker" model. They allow developers to work completely
within the VCS using standard VCS tooling, without requiring the
creation of a patch file or figuring out the right location to upload
the patch to. This lowers the barrier for sending a change to be
reviewed.

Merge requests are far easier to review. For example, they provide nice
syntax highlighted diffs which can operate in either unified or side by
side views. They allow commenting inline and on the merge request as a
whole and they present that in a nice unified way which will also hide
comments which no longer apply. Comments can be hidden and revealed.

Actually merging a merge request is quite simple, if the source branch
applies cleanly to the target branch. A core reviewer simply needs to
press the "Merge" button for GitLab to automatically perform the merge.
The source branch can be optionally rebased, and once the merge is
completed, the source branch can be automatically deleted.

GitLab also has a good workflow for submitting pull requests to a
project completely through their web interface. This would enable the
Python documentation to have "Edit on GitLab" buttons on every page and
people who discover things like typos, inaccuracies, or just want to
make improvements to the docs they are currently reading. They can
simply hit that button and get an in browser editor that will let them
make changes and submit a merge request all from the comfort of their
browser.

Criticism

X is not written in Python

One feature that the current tooling (Mercurial, Rietveld) has is that
the primary language for all of the pieces are written in Python. This
PEP focuses more on the best tools for the job and not necessarily on
the best tools that happen to be written in Python. Volunteer time is
the most precious resource for any open source project and we can best
respect and utilize that time by focusing on the benefits and downsides
of the tools themselves rather than what language their authors happened
to write them in.

One concern is the ability to modify tools to work for us, however one
of the Goals here is to not modify software to work for us and instead
adapt ourselves to a more standardized workflow. This standardization
pays off in the ability to re-use tools out of the box freeing up
developer time to actually work on Python itself as well as enabling
knowledge sharing between projects.

However, if we do need to modify the tooling, Git itself is largely
written in C the same as CPython itself. It can also have commands
written for it using any language, including Python. GitLab itself is
largely written in Ruby and since it is Open Source software, we would
have the ability to submit merge requests to the upstream Community
Edition, albeit in language potentially unfamiliar to most Python
programmers.

Mercurial is better than Git

Whether Mercurial or Git is better on a technical level is a highly
subjective opinion. This PEP does not state whether the mechanics of Git
or Mercurial are better, and instead focuses on the network effect that
is available for either option. While this PEP proposes switching to
Git, Mercurial users are not left completely out of the loop. By using
the hg-git extension for Mercurial, working with server-side Git
repositories is fairly easy and straightforward.

CPython Workflow is too Complicated

One sentiment that came out of previous discussions was that the
multi-branch model of CPython was too complicated for GitLab style merge
requests. This PEP disagrees with that sentiment.

Currently any particular change requires manually creating a patch for
2.7 and 3.x which won't change at all in this regards.

If someone submits a fix for the current stable branch (e.g. 3.5) the
merge request workflow can be used to create a request to merge the
current stable branch into the master branch, assuming there is no merge
conflicts. As always, merge conflicts must be manually and locally
resolved. Because developers also have the option of performing the
merge locally, this provides an improvement over the current situation
where the merge must always happen locally.

For fixes in the current development branch that must also be applied to
stable release branches, it is possible in many situations to locally
cherry pick and apply the change to other branches, with merge requests
submitted for each stable branch. It is also possible just cherry pick
and complete the merge locally. These are all accomplished with standard
Git commands and techniques, with the advantage that all such changes
can go through the review and CI test workflows, even for merges to
stable branches. Minor changes may be easily accomplished in the GitLab
web editor.

No system can hide all the complexities involved in maintaining several
long lived branches. The only thing that the tooling can do is make it
as easy as possible to submit and commit changes.

Open issues

-   What level of hosted support will GitLab offer? The PEP author has
    been in contact with the GitLab CEO, with positive interest on their
    part. The details of the hosting offer would have to be discussed.
-   What happens to Roundup and do we switch to the GitLab issue
    tracker? Currently, this PEP is not suggesting we move from Roundup
    to GitLab issues. We have way too much invested in Roundup right now
    and migrating the data would be a huge effort. GitLab does support
    webhooks, so we will probably want to use webhooks to integrate
    merges and other events with updates to Roundup (e.g. to include
    pointers to commits, close issues, etc. similar to what is currently
    done).
-   What happens to wiki.python.org? Nothing! While GitLab does support
    wikis in repositories, there's no reason for us to migration our
    Moin wikis.
-   What happens to the existing GitHub mirrors? We'd probably want to
    regenerate them once the official upstream branches are natively
    hosted in Git. This may change commit ids, but after that, it should
    be easy to mirror the official Git branches and repositories far and
    wide.
-   Where would the GitLab instance live? Physically, in whatever
    hosting provider GitLab chooses. We would point gitlab.python.org
    (or git.python.org?) to this host.

References

Copyright

This document has been placed in the public domain.

[1] Open Hub Statistics

[2] Hg-Git mercurial plugin

[3] https://about.gitlab.com