PEP: 513 Title: A Platform Tag for Portable Linux Built Distributions
Version: $Revision$ Last-Modified: $Date$ Author: Robert T. McGibbon
<rmcgibbo@gmail.com>, Nathaniel J. Smith <njs@pobox.com> BDFL-Delegate:
Alyssa Coghlan <ncoghlan@gmail.com> Discussions-To:
distutils-sig@python.org Status: Superseded Type: Informational Topic:
Packaging Content-Type: text/x-rst Created: 19-Jan-2016 Post-History:
19-Jan-2016, 25-Jan-2016, 29-Jan-2016 Superseded-By: 600 Resolution:
https://mail.python.org/pipermail/distutils-sig/2016-January/028211.html

Abstract

This PEP proposes the creation of a new platform tag for Python package
built distributions, such as wheels, called manylinux1_{x86_64,i686}
with external dependencies limited to a standardized, restricted subset
of the Linux kernel and core userspace ABI. It proposes that PyPI
support uploading and distributing wheels with this platform tag, and
that pip support downloading and installing these packages on compatible
platforms.

Rationale

Currently, distribution of binary Python extensions for Windows and OS X
is straightforward. Developers and packagers build wheels (PEP 427, PEP
491), which are assigned platform tags such as win32 or
macosx_10_6_intel, and upload these wheels to PyPI. Users can download
and install these wheels using tools such as pip.

For Linux, the situation is much more delicate. In general, compiled
Python extension modules built on one Linux distribution will not work
on other Linux distributions, or even on different machines running the
same Linux distribution with different system libraries installed.

Build tools using PEP 425 platform tags do not track information about
the particular Linux distribution or installed system libraries, and
instead assign all wheels the too-vague linux_i686 or linux_x86_64 tags.
Because of this ambiguity, there is no expectation that linux-tagged
built distributions compiled on one machine will work properly on
another, and for this reason, PyPI has not permitted the uploading of
wheels for Linux.

It would be ideal if wheel packages could be compiled that would work on
any linux system. But, because of the incredible diversity of Linux
systems -- from PCs to Android to embedded systems with custom libcs --
this cannot be guaranteed in general.

Instead, we define a standard subset of the kernel+core userspace ABI
that, in practice, is compatible enough that packages conforming to this
standard will work on many linux systems, including essentially all of
the desktop and server distributions in common use. We know this because
there are companies who have been distributing such widely-portable
pre-compiled Python extension modules for Linux -- e.g. Enthought with
Canopy[1] and Continuum Analytics with Anaconda[2].

Building on the compatibility lessons learned from these companies, we
thus define a baseline manylinux1 platform tag for use by binary Python
wheels, and introduce the implementation of preliminary tools to aid in
the construction of these manylinux1 wheels.

Key Causes of Inter-Linux Binary Incompatibility

To properly define a standard that will guarantee that wheel packages
meeting this specification will operate on many linux platforms, it is
necessary to understand the root causes which often prevent portability
of pre-compiled binaries on Linux. The two key causes are dependencies
on shared libraries which are not present on users' systems, and
dependencies on particular versions of certain core libraries like
glibc.

External Shared Libraries

Most desktop and server linux distributions come with a system package
manager (examples include APT on Debian-based systems, yum on RPM-based
systems, and pacman on Arch linux) that manages, among other
responsibilities, the installation of shared libraries installed to
system directories such as /usr/lib. Most non-trivial Python extensions
will depend on one or more of these shared libraries, and thus function
properly only on systems where the user has the proper libraries (and
the proper versions thereof), either installed using their package
manager, or installed manually by setting certain environment variables
such as LD_LIBRARY_PATH to notify the runtime linker of the location of
the depended-upon shared libraries.

Versioning of Core Shared Libraries

Even if the developers a Python extension module wish to use no external
shared libraries, the modules will generally have a dynamic runtime
dependency on the GNU C library, glibc. While it is possible, statically
linking glibc is usually a bad idea because certain important C
functions like dlopen() cannot be called from code that statically links
glibc. A runtime shared library dependency on a system-provided glibc is
unavoidable in practice.

The maintainers of the GNU C library follow a strict symbol versioning
scheme for backward compatibility. This ensures that binaries compiled
against an older version of glibc can run on systems that have a newer
glibc. The opposite is generally not true -- binaries compiled on newer
Linux distributions tend to rely upon versioned functions in glibc that
are not available on older systems.

This generally prevents wheels compiled on the latest Linux
distributions from being portable.

The manylinux1 policy

For these reasons, to achieve broad portability, Python wheels

-   should depend only on an extremely limited set of external shared
    libraries; and
-   should depend only on "old" symbol versions in those external shared
    libraries; and
-   should depend only on a widely-compatible kernel ABI.

To be eligible for the manylinux1 platform tag, a Python wheel must
therefore both (a) contain binary executables and compiled code that
links only to libraries with SONAMEs included in the following list: :

    libpanelw.so.5
    libncursesw.so.5
    libgcc_s.so.1
    libstdc++.so.6
    libm.so.6
    libdl.so.2
    librt.so.1
    libc.so.6
    libnsl.so.1
    libutil.so.1
    libpthread.so.0
    libresolv.so.2
    libX11.so.6
    libXext.so.6
    libXrender.so.1
    libICE.so.6
    libSM.so.6
    libGL.so.1
    libgobject-2.0.so.0
    libgthread-2.0.so.0
    libglib-2.0.so.0

and, (b) work on a stock CentOS 5.11[3] system that contains the system
package manager's provided versions of these libraries.

libcrypt.so.1 was retrospectively removed from the whitelist after
Fedora 30 was released with libcrypt.so.2 instead.

Because CentOS 5 is only available for x86_64 and i686 architectures,
these are the only architectures currently supported by the manylinux1
policy.

On Debian-based systems, these libraries are provided by the packages :

    libncurses5 libgcc1 libstdc++6 libc6 libx11-6 libxext6
    libxrender1 libice6 libsm6 libgl1-mesa-glx libglib2.0-0

On RPM-based systems, these libraries are provided by the packages :

    ncurses libgcc libstdc++ glibc libXext libXrender
    libICE libSM mesa-libGL glib2

This list was compiled by checking the external shared library
dependencies of the Canopy[4] and Anaconda[5] distributions, which both
include a wide array of the most popular Python modules and have been
confirmed in practice to work across a wide swath of Linux systems in
the wild.

Many of the permitted system libraries listed above use symbol
versioning schemes for backward compatibility. The latest symbol
versions provided with the CentOS 5.11 versions of these libraries are:
:

    GLIBC_2.5
    CXXABI_3.4.8
    GLIBCXX_3.4.9
    GCC_4.2.0

Therefore, as a consequence of requirement (b), any wheel that depends
on versioned symbols from the above shared libraries may depend only on
symbols with the following versions: :

    GLIBC <= 2.5
    CXXABI <= 3.4.8
    GLIBCXX <= 3.4.9
    GCC <= 4.2.0

These recommendations are the outcome of the relevant discussions in
January 2016[6],[7].

Note that in our recommendations below, we do not suggest that pip or
PyPI should attempt to check for and enforce the details of this policy
(just as they don't check for and enforce the details of existing
platform tags like win32). The text above is provided (a) as advice to
package builders, and (b) as a method for allocating blame if a given
wheel doesn't work on some system: if it satisfies the policy above,
then this is a bug in the spec or the installation tool; if it does not
satisfy the policy above, then it's a bug in the wheel. One useful
consequence of this approach is that it leaves open the possibility of
further updates and tweaks as we gain more experience, e.g., we could
have a "manylinux 1.1" policy which targets the same systems and uses
the same manylinux1 platform tag (and thus requires no further changes
to pip or PyPI), but that adjusts the list above to remove libraries
that have turned out to be problematic or add libraries that have turned
out to be safe.

libpythonX.Y.so.1

Note that libpythonX.Y.so.1 is not on the list of libraries that a
manylinux1 extension is allowed to link to. Explicitly linking to
libpythonX.Y.so.1 is unnecessary in almost all cases: the way ELF
linking works, extension modules that are loaded into the interpreter
automatically get access to all of the interpreter's symbols, regardless
of whether or not the extension itself is explicitly linked against
libpython. Furthermore, explicit linking to libpython creates problems
in the common configuration where Python is not built with
--enable-shared. In particular, on Debian and Ubuntu systems,
apt install pythonX.Y does not even install libpythonX.Y.so.1, meaning
that any wheel that did depend on libpythonX.Y.so.1 could fail to
import.

There is one situation where extensions that are linked in this way can
fail to work: if a host program (e.g., apache2) uses dlopen() to load a
module (e.g., mod_wsgi) that embeds the CPython interpreter, and the
host program does not pass the RTLD_GLOBAL flag to dlopen(), then the
embedded CPython will be unable to load any extension modules that do
not themselves link explicitly to libpythonX.Y.so.1. Fortunately,
apache2 does set the RTLD_GLOBAL flag, as do all the other programs that
embed-CPython-via-a-dlopened-plugin that we could locate, so this does
not seem to be a serious problem in practice. The incompatibility with
Debian/Ubuntu is more of an issue than the theoretical incompatibility
with a rather obscure corner case.

This is a rather complex and subtle issue that extends beyond the scope
of manylinux1; for more discussion see:[8],[9], [10].

UCS-2 vs UCS-4 builds

All versions of CPython 2.x, plus CPython 3.0-3.2 inclusive, can be
built in two ABI-incompatible modes: builds using the
--enable-unicode=ucs2 configure flag store Unicode data in UCS-2 (or
really UTF-16) format, while builds using the --enable-unicode=ucs4
configure flag store Unicode data in UCS-4. (CPython 3.3 and greater use
a different storage method that always supports UCS-4.) If we want to
make sure ucs2 wheels don't get installed into ucs4 CPythons and
vice-versa, then something must be done.

An earlier version of this PEP included a requirement that manylinux1
wheels targeting these older CPython versions should always use the ucs4
ABI. But then, in between the PEP's initial acceptance and its
implementation, pip and wheel gained first-class support for tracking
and checking this aspect of ABI compatibility for the relevant CPython
versions, which is a better solution. So we now allow the manylinux1
platform tags to be used in combination with any ABI tag. However, to
maintain compatibility it is crucial to ensure that all manylinux1
wheels include a non-trivial abi tag. For example, a wheel built against
a ucs4 CPython might have a name like:

    PKG-VERSION-cp27-cp27mu-manylinux1_x86_64.whl
                     ^^^^^^ Good!

While a wheel built against the ucs2 ABI might have a name like:

    PKG-VERSION-cp27-cp27m-manylinux1_x86_64.whl
                     ^^^^^ Okay!

But you should never have a wheel with a name like:

    PKG-VERSION-cp27-none-manylinux1_x86_64.whl
                     ^^^^ BAD! Don't do this!

This wheel claims to be simultaneously compatible with both ucs2 and
ucs4 builds, which is bad.

We note for information that the ucs4 ABI appears to be much more
widespread among Linux CPython distributors.

fpectl builds vs. no fpectl builds

All extant versions of CPython can be built either with or without the
--with-fpectl flag to configure. It turns out that this changes the
CPython ABI: extensions that are built against a no-fpectl CPython are
always compatible with yes-fpectl CPython, but the reverse is not
necessarily true. (Symptom: errors at import time complaining about
undefined symbol: PyFPE_jbuf.) See: [11].

For maximum compatibility, therefore, the CPython used to build
manylinux1 wheels must be compiled without the --with-fpectl flag, and
manylinux1 extensions must not reference the symbol PyFPE_jbuf.

Compilation of Compliant Wheels

The way glibc, libgcc, and libstdc++ manage their symbol versioning
means that in practice, the compiler toolchains that most developers use
to do their daily work are incapable of building manylinux1-compliant
wheels. Therefore, we do not attempt to change the default behavior of
pip wheel / bdist_wheel: they will continue to generate regular linux_*
platform tags, and developers who wish to use them to generate
manylinux1-tagged wheels will have to change the tag as a second
post-processing step.

To support the compilation of wheels meeting the manylinux1 standard, we
provide initial drafts of two tools.

Docker Image

The first tool is a Docker image based on CentOS 5.11, which is
recommended as an easy to use self-contained build box for compiling
manylinux1 wheels [12]. Compiling on a more recently-released linux
distribution will generally introduce dependencies on too-new versioned
symbols. The image comes with a full compiler suite installed (gcc, g++,
and gfortran 4.8.2) as well as the latest releases of Python and pip.

Auditwheel

The second tool is a command line executable called auditwheel[13] that
may aid in package maintainers in dealing with third-party external
dependencies.

There are at least three methods for building wheels that use
third-party external libraries in a way that meets the above policy.

1.  The third-party libraries can be statically linked.
2.  The third-party shared libraries can be distributed in separate
    packages on PyPI which are depended upon by the wheel.
3.  The third-party shared libraries can be bundled inside the wheel
    libraries, linked with a relative path.

All of these are valid option which may be effectively used by different
packages and communities. Statically linking generally requires
package-specific modifications to the build system, and distributing
third-party dependencies on PyPI may require some coordination of the
community of users of the package.

As an often-automatic alternative to these options, we introduce
auditwheel. The tool inspects all of the ELF files inside a wheel to
check for dependencies on versioned symbols or external shared
libraries, and verifies conformance with the manylinux1 policy. This
includes the ability to add the new platform tag to conforming wheels.
More importantly, auditwheel has the ability to automatically modify
wheels that depend on external shared libraries by copying those shared
libraries from the system into the wheel itself, and modifying the
appropriate RPATH entries such that these libraries will be picked up at
runtime. This accomplishes a similar result as if the libraries had been
statically linked without requiring changes to the build system.
Packagers are advised that bundling, like static linking, may implicate
copyright concerns.

Bundled Wheels on Linux

While we acknowledge many approaches for dealing with third-party
library dependencies within manylinux1 wheels, we recognize that the
manylinux1 policy encourages bundling external dependencies, a practice
which runs counter to the package management policies of many linux
distributions' system package managers[14],[15]. The primary purpose of
this is cross-distro compatibility. Furthermore, manylinux1 wheels on
PyPI occupy a different niche than the Python packages available through
the system package manager.

The decision in this PEP to encourage departure from general Linux
distribution unbundling policies is informed by the following concerns:

1.  In these days of automated continuous integration and deployment
    pipelines, publishing new versions and updating dependencies is
    easier than it was when those policies were defined.
2.  pip users remain free to use the "--no-binary" option if they want
    to force local builds rather than using pre-built wheel files.
3.  The popularity of modern container based deployment and "immutable
    infrastructure" models involve substantial bundling at the
    application layer anyway.
4.  Distribution of bundled wheels through PyPI is currently the norm
    for Windows and OS X.
5.  This PEP doesn't rule out the idea of offering more targeted
    binaries for particular Linux distributions in the future.

The model described in this PEP is most ideally suited for
cross-platform Python packages, because it means they can reuse much of
the work that they're already doing to make static Windows and OS X
wheels. We recognize that it is less optimal for Linux-specific packages
that might prefer to interact more closely with Linux's unique package
management functionality and only care about targeting a small set of
particular distos.

Security Implications

One of the advantages of dependencies on centralized libraries in Linux
is that bugfixes and security updates can be deployed system-wide, and
applications which depend on these libraries will automatically feel the
effects of these patches when the underlying libraries are updated. This
can be particularly important for security updates in packages engaged
in communication across the network or cryptography.

manylinux1 wheels distributed through PyPI that bundle security-critical
libraries like OpenSSL will thus assume responsibility for prompt
updates in response disclosed vulnerabilities and patches. This closely
parallels the security implications of the distribution of binary wheels
on Windows that, because the platform lacks a system package manager,
generally bundle their dependencies. In particular, because it lacks a
stable ABI, OpenSSL cannot be included in the manylinux1 profile.

Platform Detection for Installers

Above, we defined what it means for a wheel to be manylinux1-compatible.
Here we discuss what it means for a Python installation to be
manylinux1-compatible. In particular, this is important for tools like
pip to know when deciding whether or not they should consider
manylinux1-tagged wheels for installation.

Because the manylinux1 profile is already known to work for the many
thousands of users of popular commercial Python distributions, we
suggest that installation tools should error on the side of assuming
that a system is compatible, unless there is specific reason to think
otherwise.

We know of four main sources of potential incompatibility that are
likely to arise in practice:

-   Eventually, in the future, there may exist distributions that break
    compatibility with this profile (e.g., if one of the libraries in
    the profile changes its ABI in a backwards-incompatible way)
-   A linux distribution that is too old (e.g. RHEL 4)
-   A linux distribution that does not use glibc (e.g. Alpine Linux,
    which is based on musl libc, or Android)

To address these we propose a two-pronged approach. To handle potential
future incompatibilities, we standardize a mechanism for a Python
distributor to signal that a particular Python install definitely is or
is not compatible with manylinux1: this is done by installing a module
named _manylinux, and setting its manylinux1_compatible attribute. We do
not propose adding any such module to the standard library -- this is
merely a well-known name by which distributors and installation tools
can rendezvous. However, if a distributor does add this module, they
should add it to the standard library rather than to a site-packages/
directory, because the standard library is inherited by virtualenvs
(which we want), and site-packages/ in general is not.

Then, to handle the last two cases for existing Python distributions, we
suggest a simple and reliable method to check for the presence and
version of glibc (basically using it as a "clock" for the overall age of
the distribution).

Specifically, the algorithm we propose is:

    def is_manylinux1_compatible():
        # Only Linux, and only x86-64 / i686
        from distutils.util import get_platform
        if get_platform() not in ["linux-x86_64", "linux-i686"]:
            return False

        # Check for presence of _manylinux module
        try:
            import _manylinux
            return bool(_manylinux.manylinux1_compatible)
        except (ImportError, AttributeError):
            # Fall through to heuristic check below
            pass

        # Check glibc version. CentOS 5 uses glibc 2.5.
        return have_compatible_glibc(2, 5)

    def have_compatible_glibc(major, minimum_minor):
        import ctypes

        process_namespace = ctypes.CDLL(None)
        try:
            gnu_get_libc_version = process_namespace.gnu_get_libc_version
        except AttributeError:
            # Symbol doesn't exist -> therefore, we are not linked to
            # glibc.
            return False

        # Call gnu_get_libc_version, which returns a string like "2.5".
        gnu_get_libc_version.restype = ctypes.c_char_p
        version_str = gnu_get_libc_version()
        # py2 / py3 compatibility:
        if not isinstance(version_str, str):
            version_str = version_str.decode("ascii")

        # Parse string and check against requested version.
        version = [int(piece) for piece in version_str.split(".")]
        assert len(version) == 2
        if major != version[0]:
            return False
        if minimum_minor > version[1]:
            return False
        return True

Rejected alternatives: We also considered using a configuration file,
e.g. /etc/python/compatibility.cfg. The problem with this is that a
single filesystem might contain many different interpreter environments,
each with their own ABI profile -- the manylinux1 compatibility of a
system-installed x86_64 CPython might not tell us much about the
manylinux1 compatibility of a user-installed i686 PyPy. Locating this
configuration information within the Python environment itself ensures
that it remains attached to the correct binary, and dramatically
simplifies lookup code.

We also considered using a more elaborate structure, like a list of all
platform tags that should be considered compatible, together with their
preference ordering, for example:
_binary_compat.compatible = ["manylinux1_x86_64", "centos5_x86_64", "linux_x86_64"].
However, this introduces several complications. For example, we want to
be able to distinguish between the state of "doesn't support manylinux1"
(or eventually manylinux2, etc.) versus "doesn't specify either way
whether it supports manylinux1", which is not entirely obvious in the
above representation; and, it's not at all clear what features are
really needed vis a vis preference ordering given that right now the
only possible platform tags are manylinux1 and linux. So we're deferring
a more complete solution here for a separate PEP, when / if Linux gets
more platform tags.

For the library compatibility check, we also considered much more
elaborate checks (e.g. checking the kernel version, searching for and
checking the versions of all the individual libraries listed in the
manylinux1 profile, etc.), but ultimately decided that this would be
more likely to introduce confusing bugs than actually help the user.
(For example: different distributions vary in where they actually put
these libraries, and if our checking code failed to use the correct path
search then it could easily return incorrect answers.)

PyPI Support

PyPI should permit wheels containing the manylinux1 platform tag to be
uploaded. PyPI should not attempt to formally verify that wheels
containing the manylinux1 platform tag adhere to the manylinux1 policy
described in this document. This verification tasks should be left to
other tools, like auditwheel, that are developed separately.

Rejected Alternatives

One alternative would be to provide separate platform tags for each
Linux distribution (and each version thereof), e.g. RHEL6, ubuntu14_10,
debian_jessie, etc. Nothing in this proposal rules out the possibility
of adding such platform tags in the future, or of further extensions to
wheel metadata that would allow wheels to declare dependencies on
external system-installed packages. However, such extensions would
require substantially more work than this proposal, and still might not
be appreciated by package developers who would prefer not to have to
maintain multiple build environments and build multiple wheels in order
to cover all the common Linux distributions. Therefore, we consider such
proposals to be out-of-scope for this PEP.

Future updates

We anticipate that at some point in the future there will be a
manylinux2 specifying a more modern baseline environment (perhaps based
on CentOS 6), and someday a manylinux3 and so forth, but we defer
specifying these until we have more experience with the initial
manylinux1 proposal.

References

Copyright

This document has been placed into the public domain.

[1] Enthought Canopy Python Distribution
(https://store.enthought.com/downloads/)

[2] Continuum Analytics Anaconda Python Distribution
(https://www.continuum.io/downloads)

[3] CentOS 5.11 Release Notes
(https://wiki.centos.org/Manuals/ReleaseNotes/CentOS5.11)

[4] Enthought Canopy Python Distribution
(https://store.enthought.com/downloads/)

[5] Continuum Analytics Anaconda Python Distribution
(https://www.continuum.io/downloads)

[6] manylinux-discuss mailing list discussion
(https://groups.google.com/forum/#!topic/manylinux-discuss/-4l3rrjfr9U)

[7] distutils-sig discussion
(https://mail.python.org/pipermail/distutils-sig/2016-January/027997.html)

[8] distutils-sig discussion
(https://mail.python.org/pipermail/distutils-sig/2016-February/028275.html)

[9] github issue discussion
(https://github.com/pypa/manylinux/issues/30)

[10] python bug tracker discussion (https://bugs.python.org/issue21536)

[11] numpy bug report:
https://github.com/numpy/numpy/issues/8415#issuecomment-269095235

[12] manylinux1 docker images (Source:
https://github.com/pypa/manylinux; x86-64:
https://quay.io/repository/pypa/manylinux1_x86_64; x86-32:
https://quay.io/repository/pypa/manylinux1_i686)

[13] auditwheel tool (https://pypi.python.org/pypi/auditwheel)

[14] Fedora Bundled Software Policy
(https://fedoraproject.org/wiki/Bundled_Software_policy)

[15] Debian Policy Manual -- 4.13: Convenience copies of code
(https://www.debian.org/doc/debian-policy/ch-source.html#s-embeddedfiles)