PEP: 386 Title: Changing the version comparison module in Distutils
Version: $Revision$ Last-Modified: $Date$ Author: Tarek Ziadé
<tarek@ziade.org> Status: Superseded Type: Standards Track Topic:
Packaging Content-Type: text/x-rst Created: 04-Jun-2009 Superseded-By:
440

Abstract

Note: This PEP has been superseded by the version identification and
dependency specification scheme defined in PEP 440.

This PEP proposed a new version comparison schema system in Distutils.

Motivation

In Python there are no real restrictions yet on how a project should
manage its versions, and how they should be incremented.

Distutils provides a version distribution meta-data field but it is
freeform and current users, such as PyPI usually consider the latest
version pushed as the latest one, regardless of the expected semantics.

Distutils will soon extend its capabilities to allow distributions to
express a dependency on other distributions through the Requires-Dist
metadata field (see PEP 345) and it will optionally allow use of that
field to restrict the dependency to a set of compatible versions. Notice
that this field is replacing Requires that was expressing dependencies
on modules and packages.

The Requires-Dist field will allow a distribution to define a dependency
on another package and optionally restrict this dependency to a set of
compatible versions, so one may write:

    Requires-Dist: zope.interface (>3.5.0)

This means that the distribution requires zope.interface with a version
greater than 3.5.0.

This also means that Python projects will need to follow the same
convention as the tool that will be used to install them, so they are
able to compare versions.

That is why this PEP proposes, for the sake of interoperability, a
standard schema to express version information and its comparison
semantics.

Furthermore, this will make OS packagers' work easier when repackaging
standards compliant distributions, because as of now it can be difficult
to decide how two distribution versions compare.

Requisites and current status

It is not in the scope of this PEP to provide a universal versioning
schema intended to support all or even most of existing versioning
schemas. There will always be competing grammars, either mandated by
distro or project policies or by historical reasons that we cannot
expect to change.

The proposed schema should be able to express the usual versioning
semantics, so it's possible to parse any alternative versioning schema
and transform it into a compliant one. This is how OS packagers usually
deal with the existing version schemas and is a preferable alternative
than supporting an arbitrary set of versioning schemas.

Conformance to usual practice and conventions, as well as a simplicity
are a plus, to ease frictionless adoption and painless transition.
Practicality beats purity, sometimes.

Projects have very different versioning needs, but the following are
widely considered important semantics:

1.  it should be possible to express more than one versioning level
    (usually this is expressed as major and minor revision and,
    sometimes, also a micro revision).
2.  a significant number of projects need special meaning versions for
    "pre-releases" (such as "alpha", "beta", "rc"), and these have
    widely used aliases ("a" stands for "alpha", "b" for "beta" and "c"
    for "rc"). And these pre-release versions make it impossible to use
    a simple alphanumerical ordering of the version string components.
    (Example: 3.1a1 < 3.1)
3.  some projects also need "post-releases" of regular versions, mainly
    for installer work which can't be clearly expressed otherwise.
4.  development versions allow packagers of unreleased work to avoid
    version clash with later regular releases.

For people that want to go further and use a tool to manage their
version numbers, the two major ones are:

-   The current Distutils system[1]
-   Setuptools[2]

Distutils

Distutils currently provides a StrictVersion and a LooseVersion class
that can be used to manage versions.

The LooseVersion class is quite lax. From Distutils doc:

    Version numbering for anarchists and software realists.
    Implements the standard interface for version number classes as
    described above.  A version number consists of a series of numbers,
    separated by either periods or strings of letters.  When comparing
    version numbers, the numeric components will be compared
    numerically, and the alphabetic components lexically.  The following
    are all valid version numbers, in no particular order:

        1.5.1
        1.5.2b2
        161
        3.10a
        8.02
        3.4j
        1996.07.12
        3.2.pl0
        3.1.1.6
        2g6
        11g
        0.960923
        2.2beta29
        1.13++
        5.5.kw
        2.0b1pl0

    In fact, there is no such thing as an invalid version number under
    this scheme; the rules for comparison are simple and predictable,
    but may not always give the results you want (for some definition
    of "want").

This class makes any version string valid, and provides an algorithm to
sort them numerically then lexically. It means that anything can be used
to version your project:

    >>> from distutils.version import LooseVersion as V
    >>> v1 = V('FunkyVersion')
    >>> v2 = V('GroovieVersion')
    >>> v1 > v2
    False

The problem with this is that while it allows expressing any nesting
level it doesn't allow giving special meaning to versions (pre and
post-releases as well as development versions), as expressed in
requisites 2, 3 and 4.

The StrictVersion class is more strict. From the doc:

    Version numbering for meticulous retentive and software idealists.
    Implements the standard interface for version number classes as
    described above.  A version number consists of two or three
    dot-separated numeric components, with an optional "pre-release" tag
    on the end.  The pre-release tag consists of the letter 'a' or 'b'
    followed by a number.  If the numeric components of two version
    numbers are equal, then one with a pre-release tag will always
    be deemed earlier (lesser) than one without.

    The following are valid version numbers (shown in the order that
    would be obtained by sorting according to the supplied cmp function):

        0.4       0.4.0  (these two are equivalent)
        0.4.1
        0.5a1
        0.5b3
        0.5
        0.9.6
        1.0
        1.0.4a3
        1.0.4b1
        1.0.4

    The following are examples of invalid version numbers:

        1
        2.7.2.2
        1.3.a4
        1.3pl1
        1.3c4

This class enforces a few rules, and makes a decent tool to work with
version numbers:

    >>> from distutils.version import StrictVersion as V
    >>> v2 = V('GroovieVersion')
    Traceback (most recent call last):
    ...
    ValueError: invalid version number 'GroovieVersion'
    >>> v2 = V('1.1')
    >>> v3 = V('1.3')
    >>> v2 < v3
    True

It adds pre-release versions, and some structure, but lacks a few
semantic elements to make it usable, such as development releases or
post-release tags, as expressed in requisites 3 and 4.

Also, note that Distutils version classes have been present for years
but are not really used in the community.

Setuptools

Setuptools provides another version comparison tool[3] which does not
enforce any rules for the version, but tries to provide a better
algorithm to convert the strings to sortable keys, with a parse_version
function.

From the doc:

    Convert a version string to a chronologically-sortable key

    This is a rough cross between Distutils' StrictVersion and LooseVersion;
    if you give it versions that would work with StrictVersion, then it behaves
    the same; otherwise it acts like a slightly-smarter LooseVersion. It is
    *possible* to create pathological version coding schemes that will fool
    this parser, but they should be very rare in practice.

    The returned value will be a tuple of strings.  Numeric portions of the
    version are padded to 8 digits so they will compare numerically, but
    without relying on how numbers compare relative to strings.  Dots are
    dropped, but dashes are retained.  Trailing zeros between alpha segments
    or dashes are suppressed, so that e.g. "2.4.0" is considered the same as
    "2.4". Alphanumeric parts are lower-cased.

    The algorithm assumes that strings like "-" and any alpha string that
    alphabetically follows "final"  represents a "patch level".  So, "2.4-1"
    is assumed to be a branch or patch of "2.4", and therefore "2.4.1" is
    considered newer than "2.4-1", which in turn is newer than "2.4".

    Strings like "a", "b", "c", "alpha", "beta", "candidate" and so on (that
    come before "final" alphabetically) are assumed to be pre-release versions,
    so that the version "2.4" is considered newer than "2.4a1".

    Finally, to handle miscellaneous cases, the strings "pre", "preview", and
    "rc" are treated as if they were "c", i.e. as though they were release
    candidates, and therefore are not as new as a version string that does not
    contain them, and "dev" is replaced with an '@' so that it sorts lower
    than any other pre-release tag.

In other words, parse_version will return a tuple for each version
string, that is compatible with StrictVersion but also accept arbitrary
version and deal with them so they can be compared:

    >>> from pkg_resources import parse_version as V
    >>> V('1.2')
    ('00000001', '00000002', '*final')
    >>> V('1.2b2')
    ('00000001', '00000002', '*b', '00000002', '*final')
    >>> V('FunkyVersion')
    ('*funkyversion', '*final')

In this schema practicality takes priority over purity, but as a result
it doesn't enforce any policy and leads to very complex semantics due to
the lack of a clear standard. It just tries to adapt to widely used
conventions.

Caveats of existing systems

The major problem with the described version comparison tools is that
they are too permissive and, at the same time, aren't capable of
expressing some of the required semantics. Many of the versions on
PyPI[4] are obviously not useful versions, which makes it difficult for
users to grok the versioning that a particular package was using and to
provide tools on top of PyPI.

Distutils classes are not really used in Python projects, but the
Setuptools function is quite widespread because it's used by tools like
easy_install[5], pip[6] or zc.buildout [7] to install dependencies of a
given project.

While Setuptools does provide a mechanism for comparing/sorting
versions, it is much preferable if the versioning spec is such that a
human can make a reasonable attempt at that sorting without having to
run it against some code.

Also there's a problem with the use of dates at the "major" version
number (e.g. a version string "20090421") with RPMs: it means that any
attempt to switch to a more typical "major.minor..." version scheme is
problematic because it will always sort less than "20090421".

Last, the meaning of - is specific to Setuptools, while it is avoided in
some packaging systems like the one used by Debian or Ubuntu.

The new versioning algorithm

During Pycon, members of the Python, Ubuntu and Fedora community worked
on a version standard that would be acceptable for everyone.

It's currently called verlib and a prototype lives at[8].

The pseudo-format supported is:

    N.N[.N]+[{a|b|c|rc}N[.N]+][.postN][.devN]

The real regular expression is:

    expr = r"""^
    (?P<version>\d+\.\d+)         # minimum 'N.N'
    (?P<extraversion>(?:\.\d+)*)  # any number of extra '.N' segments
    (?:
        (?P<prerel>[abc]|rc)         # 'a' = alpha, 'b' = beta
                                     # 'c' or 'rc' = release candidate
        (?P<prerelversion>\d+(?:\.\d+)*)
    )?
    (?P<postdev>(\.post(?P<post>\d+))?(\.dev(?P<dev>\d+))?)?
    $"""

Some examples probably make it clearer:

    >>> from verlib import NormalizedVersion as V
    >>> (V('1.0a1')
    ...  < V('1.0a2.dev456')
    ...  < V('1.0a2')
    ...  < V('1.0a2.1.dev456')
    ...  < V('1.0a2.1')
    ...  < V('1.0b1.dev456')
    ...  < V('1.0b2')
    ...  < V('1.0b2.post345')
    ...  < V('1.0c1.dev456')
    ...  < V('1.0c1')
    ...  < V('1.0.dev456')
    ...  < V('1.0')
    ...  < V('1.0.post456.dev34')
    ...  < V('1.0.post456'))
    True

The trailing .dev123 is for pre-releases. The .post123 is for
post-releases -- which apparently are used by a number of projects out
there (e.g. Twisted[9]). For example, after a 1.2.0 release there might
be a 1.2.0-r678 release. We used post instead of r because the r is
ambiguous as to whether it indicates a pre- or post-release.

.post456.dev34 indicates a dev marker for a post release, that sorts
before a .post456 marker. This can be used to do development versions of
post releases.

Pre-releases can use a for "alpha", b for "beta" and c for "release
candidate". rc is an alternative notation for "release candidate" that
is added to make the version scheme compatible with Python's own version
scheme. rc sorts after c:

    >>> from verlib import NormalizedVersion as V
    >>> (V('1.0a1')
    ...  < V('1.0a2')
    ...  < V('1.0b3')
    ...  < V('1.0c1')
    ...  < V('1.0rc2')
    ...  < V('1.0'))
    True

Note that c is the preferred marker for third party projects.

verlib provides a NormalizedVersion class and a
suggest_normalized_version function.

NormalizedVersion

The NormalizedVersion class is used to hold a version and to compare it
with others. It takes a string as an argument, that contains the
representation of the version:

    >>> from verlib import NormalizedVersion
    >>> version = NormalizedVersion('1.0')

The version can be represented as a string:

    >>> str(version)
    '1.0'

Or compared with others:

    >>> NormalizedVersion('1.0') > NormalizedVersion('0.9')
    True
    >>> NormalizedVersion('1.0') < NormalizedVersion('1.1')
    True

A class method called from_parts is available if you want to create an
instance by providing the parts that composes the version.

Examples :

    >>> version = NormalizedVersion.from_parts((1, 0))
    >>> str(version)
    '1.0'

    >>> version = NormalizedVersion.from_parts((1, 0), ('c', 4))
    >>> str(version)
    '1.0c4'

    >>> version = NormalizedVersion.from_parts((1, 0), ('c', 4), ('dev', 34))
    >>> str(version)
    '1.0c4.dev34'

suggest_normalized_version

suggest_normalized_version is a function that suggests a normalized
version close to the given version string. If you have a version string
that isn't normalized (i.e. NormalizedVersion doesn't like it) then you
might be able to get an equivalent (or close) normalized version from
this function.

This does a number of simple normalizations to the given string, based
on an observation of versions currently in use on PyPI.

Given a dump of those versions on January 6th 2010, the function has
given those results out of the 8821 distributions PyPI had:

-   7822 (88.67%) already match NormalizedVersion without any change
-   717 (8.13%) match when using this suggestion method
-   282 (3.20%) don't match at all.

The 3.20% of projects that are incompatible with NormalizedVersion and
cannot be transformed into a compatible form, are for most of them
date-based version schemes, versions with custom markers, or dummy
versions. Examples:

-   working proof of concept
-   1 (first draft)
-   unreleased.unofficialdev
-   0.1.alphadev
-   2008-03-29_r219
-   etc.

When a tool needs to work with versions, a strategy is to use
suggest_normalized_version on the versions string. If this function
returns None, it means that the provided version is not close enough to
the standard scheme. If it returns a version that slightly differs from
the original version, it's a suggested normalized version. Last, if it
returns the same string, it means that the version matches the scheme.

Here's an example of usage:

    >>> from verlib import suggest_normalized_version, NormalizedVersion
    >>> import warnings
    >>> def validate_version(version):
    ...     rversion = suggest_normalized_version(version)
    ...     if rversion is None:
    ...         raise ValueError('Cannot work with "%s"' % version)
    ...     if rversion != version:
    ...         warnings.warn('"%s" is not a normalized version.\n'
    ...                       'It has been transformed into "%s" '
    ...                       'for interoperability.' % (version, rversion))
    ...     return NormalizedVersion(rversion)
    ...

    >>> validate_version('2.4-rc1')
    __main__:8: UserWarning: "2.4-rc1" is not a normalized version.
    It has been transformed into "2.4c1" for interoperability.
    NormalizedVersion('2.4c1')

    >>> validate_version('2.4c1')
    NormalizedVersion('2.4c1')

    >>> validate_version('foo')
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 4, in validate_version
    ValueError: Cannot work with "foo"

Roadmap

Distutils will deprecate its existing versions class in favor of
NormalizedVersion. The verlib module presented in this PEP will be
renamed to version and placed into the distutils package.

References

Acknowledgments

Trent Mick, Matthias Klose, Phillip Eby, David Lyon, and many people at
Pycon and Distutils-SIG.

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

[1] http://docs.python.org/distutils

[2] http://peak.telecommunity.com/DevCenter/setuptools

[3] http://peak.telecommunity.com/DevCenter/setuptools#specifying-your-project-s-version

[4] http://pypi.python.org/pypi

[5] http://peak.telecommunity.com/DevCenter/EasyInstall

[6] http://pypi.python.org/pypi/pip

[7] http://pypi.python.org/pypi/zc.buildout

[8] http://bitbucket.org/tarek/distutilsversion/

[9] http://twistedmatrix.com/trac/