PEP: 517 Title: A build-system independent format for source trees
Version: $Revision$ Last-Modified: $Date$ Author: Nathaniel J. Smith
<njs@pobox.com>, Thomas Kluyver <thomas@kluyver.me.uk> BDFL-Delegate:
Alyssa Coghlan <ncoghlan@gmail.com> Discussions-To:
distutils-sig@python.org Status: Final Type: Standards Track Topic:
Packaging Content-Type: text/x-rst Created: 30-Sep-2015 Post-History:
01-Oct-2015, 25-Oct-2015, 19-May-2017, 11-Sep-2017 Resolution:
https://mail.python.org/pipermail/distutils-sig/2017-September/031548.html

Abstract

While distutils / setuptools have taken us a long way, they suffer from
three serious problems: (a) they're missing important features like
usable build-time dependency declaration, autoconfiguration, and even
basic ergonomic niceties like DRY-compliant version number management,
and (b) extending them is difficult, so while there do exist various
solutions to the above problems, they're often quirky, fragile, and
expensive to maintain, and yet (c) it's very difficult to use anything
else, because distutils/setuptools provide the standard interface for
installing packages expected by both users and installation tools like
pip.

Previous efforts (e.g. distutils2 or setuptools itself) have attempted
to solve problems (a) and/or (b). This proposal aims to solve (c).

The goal of this PEP is get distutils-sig out of the business of being a
gatekeeper for Python build systems. If you want to use distutils,
great; if you want to use something else, then that should be easy to do
using standardized methods. The difficulty of interfacing with distutils
means that there aren't many such systems right now, but to give a sense
of what we're thinking about see flit or bento. Fortunately, wheels have
now solved many of the hard problems here -- e.g. it's no longer
necessary that a build system also know about every possible
installation configuration -- so pretty much all we really need from a
build system is that it have some way to spit out standard-compliant
wheels and sdists.

We therefore propose a new, relatively minimal interface for
installation tools like pip to interact with package source trees and
source distributions.

Terminology and goals

A source tree is something like a VCS checkout. We need a standard
interface for installing from this format, to support usages like
pip install some-directory/.

A source distribution is a static snapshot representing a particular
release of some source code, like lxml-3.4.4.tar.gz. Source
distributions serve many purposes: they form an archival record of
releases, they provide a stupid-simple de facto standard for tools that
want to ingest and process large corpora of code, possibly written in
many languages (e.g. code search), they act as the input to downstream
packaging systems like Debian/Fedora/Conda/..., and so forth. In the
Python ecosystem they additionally have a particularly important role to
play, because packaging tools like pip are able to use source
distributions to fulfill binary dependencies, e.g. if there is a
distribution foo.whl which declares a dependency on bar, then we need to
support the case where pip install bar or pip install foo automatically
locates the sdist for bar, downloads it, builds it, and installs the
resulting package.

Source distributions are also known as sdists for short.

A build frontend is a tool that users might run that takes arbitrary
source trees or source distributions and builds wheels from them. The
actual building is done by each source tree's build backend. In a
command like pip wheel some-directory/, pip is acting as a build
frontend.

An integration frontend is a tool that users might run that takes a set
of package requirements (e.g. a requirements.txt file) and attempts to
update a working environment to satisfy those requirements. This may
require locating, building, and installing a combination of wheels and
sdists. In a command like pip install lxml==2.4.0, pip is acting as an
integration frontend.

Source trees

There is an existing, legacy source tree format involving setup.py. We
don't try to specify it further; its de facto specification is encoded
in the source code and documentation of distutils, setuptools, pip, and
other tools. We'll refer to it as the setup.py-style.

Here we define a new style of source tree based around the
pyproject.toml file defined in PEP 518, extending the [build-system]
table in that file with one additional key, build-backend. Here's an
example of how it would look:

    [build-system]
    # Defined by PEP 518:
    requires = ["flit"]
    # Defined by this PEP:
    build-backend = "flit.api:main"

build-backend is a string naming a Python object that will be used to
perform the build (see below for details). This is formatted following
the same module:object syntax as a setuptools entry point. For instance,
if the string is "flit.api:main" as in the example above, this object
would be looked up by executing the equivalent of:

    import flit.api
    backend = flit.api.main

It's also legal to leave out the :object part, e.g. :

    build-backend = "flit.api"

which acts like:

    import flit.api
    backend = flit.api

Formally, the string should satisfy this grammar:

    identifier = (letter | '_') (letter | '_' | digit)*
    module_path = identifier ('.' identifier)*
    object_path = identifier ('.' identifier)*
    entry_point = module_path (':' object_path)?

And we import module_path and then lookup module_path.object_path (or
just module_path if object_path is missing).

When importing the module path, we do not look in the directory
containing the source tree, unless that would be on sys.path anyway
(e.g. because it is specified in PYTHONPATH). Although Python
automatically adds the working directory to sys.path in some situations,
code to resolve the backend should not be affected by this.

If the pyproject.toml file is absent, or the build-backend key is
missing, the source tree is not using this specification, and tools
should revert to the legacy behaviour of running setup.py (either
directly, or by implicitly invoking the setuptools.build_meta:__legacy__
backend).

Where the build-backend key exists, this takes precedence and the source
tree follows the format and conventions of the specified backend (as
such no setup.py is needed unless the backend requires it). Projects may
still wish to include a setup.py for compatibility with tools that do
not use this spec.

This PEP also defines a backend-path key for use in pyproject.toml, see
the "In-Tree Build Backends" section below. This key would be used as
follows:

    [build-system]
    # Defined by PEP 518:
    requires = ["flit"]
    # Defined by this PEP:
    build-backend = "local_backend"
    backend-path = ["backend"]

Build requirements

This PEP places a number of additional requirements on the "build
requirements" section of pyproject.toml. These are intended to ensure
that projects do not create impossible to satisfy conditions with their
build requirements.

-   Project build requirements will define a directed graph of
    requirements (project A needs B to build, B needs C and D, etc.)
    This graph MUST NOT contain cycles. If (due to lack of co-ordination
    between projects, for example) a cycle is present, front ends MAY
    refuse to build the project.
-   Where build requirements are available as wheels, front ends SHOULD
    use these where practical, to avoid deeply nested builds. However
    front ends MAY have modes where they do not consider wheels when
    locating build requirements, and so projects MUST NOT assume that
    publishing wheels is sufficient to break a requirement cycle.
-   Front ends SHOULD check explicitly for requirement cycles, and
    terminate the build with an informative message if one is found.

Note in particular that the requirement for no requirement cycles means
that backends wishing to self-host (i.e., building a wheel for a backend
uses that backend for the build) need to make special provision to avoid
causing cycles. Typically this will involve specifying themselves as an
in-tree backend, and avoiding external build dependencies (usually by
vendoring them).

Build backend interface

The build backend object is expected to have attributes which provide
some or all of the following hooks. The common config_settings argument
is described after the individual hooks.

Mandatory hooks

build_wheel

    def build_wheel(wheel_directory, config_settings=None, metadata_directory=None):
        ...

Must build a .whl file, and place it in the specified wheel_directory.
It must return the basename (not the full path) of the .whl file it
creates, as a unicode string.

If the build frontend has previously called
prepare_metadata_for_build_wheel and depends on the wheel resulting from
this call to have metadata matching this earlier call, then it should
provide the path to the created .dist-info directory as the
metadata_directory argument. If this argument is provided, then
build_wheel MUST produce a wheel with identical metadata. The directory
passed in by the build frontend MUST be identical to the directory
created by prepare_metadata_for_build_wheel, including any unrecognized
files it created.

Backends which do not provide the prepare_metadata_for_build_wheel hook
may either silently ignore the metadata_directory parameter to
build_wheel, or else raise an exception when it is set to anything other
than None.

To ensure that wheels from different sources are built the same way,
frontends may call build_sdist first, and then call build_wheel in the
unpacked sdist. But if the backend indicates that it is missing some
requirements for creating an sdist (see below), the frontend will fall
back to calling build_wheel in the source directory.

The source directory may be read-only. Backends should therefore be
prepared to build without creating or modifying any files in the source
directory, but they may opt not to handle this case, in which case
failures will be visible to the user. Frontends are not responsible for
any special handling of read-only source directories.

The backend may store intermediate artifacts in cache locations or
temporary directories. The presence or absence of any caches should not
make a material difference to the final result of the build.

build_sdist

    def build_sdist(sdist_directory, config_settings=None):
        ...

Must build a .tar.gz source distribution and place it in the specified
sdist_directory. It must return the basename (not the full path) of the
.tar.gz file it creates, as a unicode string.

A .tar.gz source distribution (sdist) contains a single top-level
directory called {name}-{version} (e.g. foo-1.0), containing the source
files of the package. This directory must also contain the
pyproject.toml from the build directory, and a PKG-INFO file containing
metadata in the format described in PEP 345. Although historically zip
files have also been used as sdists, this hook should produce a gzipped
tarball. This is already the more common format for sdists, and having a
consistent format makes for simpler tooling.

The generated tarball should use the modern POSIX.1-2001 pax tar format,
which specifies UTF-8 based file names. This is not yet the default for
the tarfile module shipped with Python 3.6, so backends using the
tarfile module need to explicitly pass format=tarfile.PAX_FORMAT.

Some backends may have extra requirements for creating sdists, such as
version control tools. However, some frontends may prefer to make
intermediate sdists when producing wheels, to ensure consistency. If the
backend cannot produce an sdist because a dependency is missing, or for
another well understood reason, it should raise an exception of a
specific type which it makes available as UnsupportedOperation on the
backend object. If the frontend gets this exception while building an
sdist as an intermediate for a wheel, it should fall back to building a
wheel directly. The backend does not need to define this exception type
if it would never raise it.

Optional hooks

get_requires_for_build_wheel

    def get_requires_for_build_wheel(config_settings=None):
        ...

This hook MUST return an additional list of strings containing PEP 508
dependency specifications, above and beyond those specified in the
pyproject.toml file, to be installed when calling the build_wheel or
prepare_metadata_for_build_wheel hooks.

Example:

    def get_requires_for_build_wheel(config_settings):
        return ["wheel >= 0.25", "setuptools"]

If not defined, the default implementation is equivalent to return [].

prepare_metadata_for_build_wheel

    def prepare_metadata_for_build_wheel(metadata_directory, config_settings=None):
        ...

Must create a .dist-info directory containing wheel metadata inside the
specified metadata_directory (i.e., creates a directory like
{metadata_directory}/{package}-{version}.dist-info/). This directory
MUST be a valid .dist-info directory as defined in the wheel
specification, except that it need not contain RECORD or signatures. The
hook MAY also create other files inside this directory, and a build
frontend MUST preserve, but otherwise ignore, such files; the intention
here is that in cases where the metadata depends on build-time
decisions, the build backend may need to record these decisions in some
convenient format for re-use by the actual wheel-building step.

This must return the basename (not the full path) of the .dist-info
directory it creates, as a unicode string.

If a build frontend needs this information and the method is not
defined, it should call build_wheel and look at the resulting metadata
directly.

get_requires_for_build_sdist

    def get_requires_for_build_sdist(config_settings=None):
        ...

This hook MUST return an additional list of strings containing PEP 508
dependency specifications, above and beyond those specified in the
pyproject.toml file. These dependencies will be installed when calling
the build_sdist hook.

If not defined, the default implementation is equivalent to return [].

Note

Editable installs

This PEP originally specified another hook, install_editable, to do an
editable install (as with pip install -e). It was removed due to the
complexity of the topic, but may be specified in a later PEP.

Briefly, the questions to be answered include: what reasonable ways
existing of implementing an 'editable install'? Should the backend or
the frontend pick how to make an editable install? And if the frontend
does, what does it need from the backend to do so.

Config settings

    config_settings

This argument, which is passed to all hooks, is an arbitrary dictionary
provided as an "escape hatch" for users to pass ad-hoc configuration
into individual package builds. Build backends MAY assign any semantics
they like to this dictionary. Build frontends SHOULD provide some
mechanism for users to specify arbitrary string-key/string-value pairs
to be placed in this dictionary. For example, they might support some
syntax like --package-config CC=gcc. In case a user provides duplicate
string-keys, build frontends SHOULD combine the corresponding
string-values into a list of strings. Build frontends MAY also provide
arbitrary other mechanisms for users to place entries in this
dictionary. For example, pip might choose to map a mix of modern and
legacy command line arguments like:

    pip install                                           \
      --package-config CC=gcc                             \
      --global-option="--some-global-option"              \
      --build-option="--build-option1"                    \
      --build-option="--build-option2"

into a config_settings dictionary like:

    {
     "CC": "gcc",
     "--global-option": ["--some-global-option"],
     "--build-option": ["--build-option1", "--build-option2"],
    }

Of course, it's up to users to make sure that they pass options which
make sense for the particular build backend and package that they are
building.

The hooks may be called with positional or keyword arguments, so
backends implementing them should be careful to make sure that their
signatures match both the order and the names of the arguments above.

All hooks are run with working directory set to the root of the source
tree, and MAY print arbitrary informational text on stdout and stderr.
They MUST NOT read from stdin, and the build frontend MAY close stdin
before invoking the hooks.

The build frontend may capture stdout and/or stderr from the backend. If
the backend detects that an output stream is not a terminal/console
(e.g. not sys.stdout.isatty()), it SHOULD ensure that any output it
writes to that stream is UTF-8 encoded. The build frontend MUST NOT fail
if captured output is not valid UTF-8, but it MAY not preserve all the
information in that case (e.g. it may decode using the replace error
handler in Python). If the output stream is a terminal, the build
backend is responsible for presenting its output accurately, as for any
program running in a terminal.

If a hook raises an exception, or causes the process to terminate, then
this indicates an error.

Build environment

One of the responsibilities of a build frontend is to set up the Python
environment in which the build backend will run.

We do not require that any particular "virtual environment" mechanism be
used; a build frontend might use virtualenv, or venv, or no special
mechanism at all. But whatever mechanism is used MUST meet the following
criteria:

-   All requirements specified by the project's build-requirements must
    be available for import from Python. In particular:

    -   The get_requires_for_build_wheel and
        get_requires_for_build_sdist hooks are executed in an
        environment which contains the bootstrap requirements specified
        in the pyproject.toml file.
    -   The prepare_metadata_for_build_wheel and build_wheel hooks are
        executed in an environment which contains the bootstrap
        requirements from pyproject.toml and those specified by the
        get_requires_for_build_wheel hook.
    -   The build_sdist hook is executed in an environment which
        contains the bootstrap requirements from pyproject.toml and
        those specified by the get_requires_for_build_sdist hook.

-   This must remain true even for new Python subprocesses spawned by
    the build environment, e.g. code like:

        import sys, subprocess
        subprocess.check_call([sys.executable, ...])

    must spawn a Python process which has access to all the project's
    build-requirements. This is necessary e.g. for build backends that
    want to run legacy setup.py scripts in a subprocess.

-   All command-line scripts provided by the build-required packages
    must be present in the build environment's PATH. For example, if a
    project declares a build-requirement on flit, then the following
    must work as a mechanism for running the flit command-line tool:

        import subprocess
        import shutil
        subprocess.check_call([shutil.which("flit"), ...])

A build backend MUST be prepared to function in any environment which
meets the above criteria. In particular, it MUST NOT assume that it has
access to any packages except those that are present in the stdlib, or
that are explicitly declared as build-requirements.

Frontends should call each hook in a fresh subprocess, so that backends
are free to change process global state (such as environment variables
or the working directory). A Python library will be provided which
frontends can use to easily call hooks this way.

Recommendations for build frontends (non-normative)

A build frontend MAY use any mechanism for setting up a build
environment that meets the above criteria. For example, simply
installing all build-requirements into the global environment would be
sufficient to build any compliant package -- but this would be
sub-optimal for a number of reasons. This section contains non-normative
advice to frontend implementors.

A build frontend SHOULD, by default, create an isolated environment for
each build, containing only the standard library and any explicitly
requested build-dependencies. This has two benefits:

-   It allows for a single installation run to build multiple packages
    that have contradictory build-requirements. E.g. if package1
    build-requires pbr==1.8.1, and package2 build-requires pbr==1.7.2,
    then these cannot both be installed simultaneously into the global
    environment -- which is a problem when the user requests
    pip install package1 package2. Or if the user already has pbr==1.8.1
    installed in their global environment, and a package build-requires
    pbr==1.7.2, then downgrading the user's version would be rather
    rude.
-   It acts as a kind of public health measure to maximize the number of
    packages that actually do declare accurate build-dependencies. We
    can write all the strongly worded admonitions to package authors we
    want, but if build frontends don't enforce isolation by default,
    then we'll inevitably end up with lots of packages on PyPI that
    build fine on the original author's machine and nowhere else, which
    is a headache that no-one needs.

However, there will also be situations where build-requirements are
problematic in various ways. For example, a package author might
accidentally leave off some crucial requirement despite our best
efforts; or, a package might declare a build-requirement on foo >= 1.0
which worked great when 1.0 was the latest version, but now 1.1 is out
and it has a showstopper bug; or, the user might decide to build a
package against numpy==1.7 -- overriding the package's preferred
numpy==1.8 -- to guarantee that the resulting build will be compatible
at the C ABI level with an older version of numpy (even if this means
the resulting build is unsupported upstream). Therefore, build frontends
SHOULD provide some mechanism for users to override the above defaults.
For example, a build frontend could have a
--build-with-system-site-packages option that causes the
--system-site-packages option to be passed to virtualenv-or-equivalent
when creating build environments, or a
--build-requirements-override=my-requirements.txt option that overrides
the project's normal build-requirements.

The general principle here is that we want to enforce hygiene on package
authors, while still allowing end-users to open up the hood and apply
duct tape when necessary.

In-tree build backends

In certain circumstances, projects may wish to include the source code
for the build backend directly in the source tree, rather than
referencing the backend via the requires key. Two specific situations
where this would be expected are:

-   Backends themselves, which want to use their own features for
    building themselves ("self-hosting backends")
-   Project-specific backends, typically consisting of a custom wrapper
    around a standard backend, where the wrapper is too project-specific
    to be worth distributing independently ("in-tree backends")

Projects can specify that their backend code is hosted in-tree by
including the backend-path key in pyproject.toml. This key contains a
list of directories, which the frontend will add to the start of
sys.path when loading the backend, and running the backend hooks.

There are two restrictions on the content of the backend-path key:

-   Directories in backend-path are interpreted as relative to the
    project root, and MUST refer to a location within the source tree
    (after relative paths and symbolic links have been resolved).
-   The backend code MUST be loaded from one of the directories
    specified in backend-path (i.e., it is not permitted to specify
    backend-path and not have in-tree backend code).

The first restriction is to ensure that source trees remain
self-contained, and cannot refer to locations outside of the source
tree. Frontends SHOULD check this condition (typically by resolving the
location to an absolute path and resolving symbolic links, and then
checking it against the project root), and fail with an error message if
it is violated.

The backend-path feature is intended to support the implementation of
in-tree backends, and not to allow configuration of existing backends.
The second restriction above is specifically to ensure that this is how
the feature is used. Front ends MAY enforce this check, but are not
required to. Doing so would typically involve checking the backend's
__file__ attribute against the locations in backend-path.

Source distributions

We continue with the legacy sdist format, adding some new restrictions.
This format is mostly undefined, but basically comes down to: a file
named {NAME}-{VERSION}.{EXT}, which unpacks into a buildable source tree
called {NAME}-{VERSION}/. Traditionally these have always contained
setup.py-style source trees; we now allow them to also contain
pyproject.toml-style source trees.

Integration frontends require that an sdist named {NAME}-{VERSION}.{EXT}
will generate a wheel named {NAME}-{VERSION}-{COMPAT-INFO}.whl.

The new restrictions for sdists built by PEP 517 backends are:

-   They will be gzipped tar archives, with the .tar.gz extension. Zip
    archives, or other compression formats for tarballs, are not allowed
    at present.
-   Tar archives must be created in the modern POSIX.1-2001 pax tar
    format, which uses UTF-8 for file names.
-   The source tree contained in an sdist is expected to include the
    pyproject.toml file.

Evolutionary notes

A goal here is to make it as simple as possible to convert old-style
sdists to new-style sdists. (E.g., this is one motivation for supporting
dynamic build requirements.) The ideal would be that there would be a
single static pyproject.toml that could be dropped into any "version 0"
VCS checkout to convert it to the new shiny. This is probably not 100%
possible, but we can get close, and it's important to keep track of how
close we are... hence this section.

A rough plan would be: Create a build system package
(setuptools_pypackage or whatever) that knows how to speak whatever hook
language we come up with, and convert them into calls to setup.py. This
will probably require some sort of hooking or monkeypatching to
setuptools to provide a way to extract the setup_requires= argument when
needed, and to provide a new version of the sdist command that generates
the new-style format. This all seems doable and sufficient for a large
proportion of packages (though obviously we'll want to prototype such a
system before we finalize anything here). (Alternatively, these changes
could be made to setuptools itself rather than going into a separate
package.)

But there remain two obstacles that mean we probably won't be able to
automatically upgrade packages to the new format:

1)  There currently exist packages which insist on particular packages
    being available in their environment before setup.py is executed.
    This means that if we decide to execute build scripts in an isolated
    virtualenv-like environment, then projects will need to check
    whether they do this, and if so then when upgrading to the new
    system they will have to start explicitly declaring these
    dependencies (either via setup_requires= or via static declaration
    in pyproject.toml).
2)  There currently exist packages which do not declare consistent
    metadata (e.g. egg_info and bdist_wheel might get different
    install_requires=). When upgrading to the new system, projects will
    have to evaluate whether this applies to them, and if so they will
    need to stop doing that.

Rejected options

-   We discussed making the wheel and sdist hooks build unpacked
    directories containing the same contents as their respective
    archives. In some cases this could avoid the need to pack and unpack
    an archive, but this seems like premature optimisation. It's
    advantageous for tools to work with archives as the canonical
    interchange formats (especially for wheels, where the archive format
    is already standardised). Close control of archive creation is
    important for reproducible builds. And it's not clear that tasks
    requiring an unpacked distribution will be more common than those
    requiring an archive.
-   We considered an extra hook to copy files to a build directory
    before invoking build_wheel. Looking at existing build systems, we
    found that passing a build directory into build_wheel made more
    sense for many tools than pre-emptively copying files into a build
    directory.
-   The idea of passing build_wheel a build directory was then also
    deemed an unnecessary complication. Build tools can use a temporary
    directory or a cache directory to store intermediate files while
    building. If there is a need, a frontend-controlled cache directory
    could be added in the future.
-   For build_sdist to signal a failure for an expected reason, various
    options were debated at great length, including raising
    NotImplementedError and returning either NotImplemented or None.
    Please do not attempt to reopen this discussion without an extremely
    good reason, because we are quite tired of it.
-   Allowing the backend to be imported from files in the source tree
    would be more consistent with the way Python imports often work.
    However, not allowing this prevents confusing errors from clashing
    module names. The initial version of this PEP did not provide a
    means to allow backends to be imported from files within the source
    tree, but the backend-path key was added in the next revision to
    allow projects to opt into this behaviour if needed.

Summary of changes to PEP 517

The following changes were made to this PEP after the initial reference
implementation was released in pip 19.0.

-   Cycles in build requirements were explicitly prohibited.
-   Support for in-tree backends and self-hosting of backends was added
    by the introduction of the backend-path key in the [build-system]
    table.
-   Clarified that the setuptools.build_meta:__legacy__ PEP 517 backend
    is an acceptable alternative to directly invoking setup.py for
    source trees that don't specify build-backend explicitly.

Appendix A: Comparison to PEP 516

PEP 516 is a competing proposal to specify a build system interface,
which has now been rejected in favour of this PEP. The primary
difference is that our build backend is defined via a Python hook-based
interface rather than a command-line based interface.

This appendix documents the arguments advanced for this PEP over PEP
516.

We do not expect that specifying Python hooks rather than command line
interfaces will, by itself, reduce the complexity of calling into the
backend, because build frontends will in any case want to run hooks
inside a child -- this is important to isolate the build frontend itself
from the backend code and to better control the build backends execution
environment. So under both proposals, there will need to be some code in
pip to spawn a subprocess and talk to some kind of command-line/IPC
interface, and there will need to be some code in the subprocess that
knows how to parse these command line arguments and call the actual
build backend implementation. So this diagram applies to all proposals
equally:

    +-----------+          +---------------+           +----------------+
    | frontend  | -spawn-> | child cmdline | -Python-> |    backend     |
    |   (pip)   |          |   interface   |           | implementation |
    +-----------+          +---------------+           +----------------+

The key difference between the two approaches is how these interface
boundaries map onto project structure:

    .-= This PEP =-.

    +-----------+          +---------------+    |      +----------------+
    | frontend  | -spawn-> | child cmdline | -Python-> |    backend     |
    |   (pip)   |          |   interface   |    |      | implementation |
    +-----------+          +---------------+    |      +----------------+
                                                |
    |______________________________________|    |
       Owned by pip, updated in lockstep        |
                                                |
                                                |
                                     PEP-defined interface boundary
                                   Changes here require distutils-sig


    .-= Alternative =-.

    +-----------+    |     +---------------+           +----------------+
    | frontend  | -spawn-> | child cmdline | -Python-> |    backend     |
    |   (pip)   |    |     |   interface   |           | implementation |
    +-----------+    |     +---------------+           +----------------+
                     |
                     |     |____________________________________________|
                     |      Owned by build backend, updated in lockstep
                     |
        PEP-defined interface boundary
      Changes here require distutils-sig

By moving the PEP-defined interface boundary into Python code, we gain
three key advantages.

First, because there will likely be only a small number of build
frontends (pip, and... maybe a few others?), while there will likely be
a long tail of custom build backends (since these are chosen separately
by each package to match their particular build requirements), the
actual diagrams probably look more like:

    .-= This PEP =-.

    +-----------+          +---------------+           +----------------+
    | frontend  | -spawn-> | child cmdline | -Python+> |    backend     |
    |   (pip)   |          |   interface   |        |  | implementation |
    +-----------+          +---------------+        |  +----------------+
                                                    |
                                                    |  +----------------+
                                                    +> |    backend     |
                                                    |  | implementation |
                                                    |  +----------------+
                                                    :
                                                    :

    .-= Alternative =-.

    +-----------+          +---------------+           +----------------+
    | frontend  | -spawn+> | child cmdline | -Python-> |    backend     |
    |   (pip)   |       |  |   interface   |           | implementation |
    +-----------+       |  +---------------+           +----------------+
                        |
                        |  +---------------+           +----------------+
                        +> | child cmdline | -Python-> |    backend     |
                        |  |   interface   |           | implementation |
                        |  +---------------+           +----------------+
                        :
                        :

That is, this PEP leads to less total code in the overall ecosystem. And
in particular, it reduces the barrier to entry of making a new build
system. For example, this is a complete, working build backend:

    # mypackage_custom_build_backend.py
    import os.path
    import pathlib
    import shutil
    import tarfile

    SDIST_NAME = "mypackage-0.1"
    SDIST_FILENAME = SDIST_NAME + ".tar.gz"
    WHEEL_FILENAME = "mypackage-0.1-py2.py3-none-any.whl"

    #################
    # sdist creation
    #################

    def _exclude_hidden_and_special_files(archive_entry):
        """Tarfile filter to exclude hidden and special files from the archive"""
        if archive_entry.isfile() or archive_entry.isdir():
            if not os.path.basename(archive_entry.name).startswith("."):
                return archive_entry

    def _make_sdist(sdist_dir):
        """Make an sdist and return both the Python object and its filename"""
        sdist_path = pathlib.Path(sdist_dir) / SDIST_FILENAME
        sdist = tarfile.open(sdist_path, "w:gz", format=tarfile.PAX_FORMAT)
        # Tar up the whole directory, minus hidden and special files
        sdist.add(os.getcwd(), arcname=SDIST_NAME,
                  filter=_exclude_hidden_and_special_files)
        return sdist, SDIST_FILENAME

    def build_sdist(sdist_dir, config_settings):
        """PEP 517 sdist creation hook"""
        sdist, sdist_filename = _make_sdist(sdist_dir)
        return sdist_filename

    #################
    # wheel creation
    #################

    def get_requires_for_build_wheel(config_settings):
        """PEP 517 wheel building dependency definition hook"""
        # As a simple static requirement, this could also just be
        # listed in the project's build system dependencies instead
        return ["wheel"]

    def build_wheel(wheel_directory,
                    metadata_directory=None, config_settings=None):
        """PEP 517 wheel creation hook"""
        from wheel.archive import archive_wheelfile
        path = os.path.join(wheel_directory, WHEEL_FILENAME)
        archive_wheelfile(path, "src/")
        return WHEEL_FILENAME

Of course, this is a terrible build backend: it requires the user to
have manually set up the wheel metadata in src/mypackage-0.1.dist-info/;
when the version number changes it must be manually updated in multiple
places... but it works, and more features could be added incrementally.
Much experience suggests that large successful projects often originate
as quick hacks (e.g., Linux -- "just a hobby, won't be big and
professional"; IPython/Jupyter -- a grad student's $PYTHONSTARTUP file),
so if our goal is to encourage the growth of a vibrant ecosystem of good
build tools, it's important to minimize the barrier to entry.

Second, because Python provides a simpler yet richer structure for
describing interfaces, we remove unnecessary complexity from the
specification -- and specifications are the worst place for complexity,
because changing specifications requires painful consensus-building
across many stakeholders. In the command-line interface approach, we
have to come up with ad hoc ways to map multiple different kinds of
inputs into a single linear command line (e.g. how do we avoid
collisions between user-specified configuration arguments and
PEP-defined arguments? how do we specify optional arguments? when
working with a Python interface these questions have simple, obvious
answers). When spawning and managing subprocesses, there are many fiddly
details that must be gotten right, subtle cross-platform differences,
and some of the most obvious approaches --e.g., using stdout to return
data for the build_requires operation -- can create unexpected pitfalls
(e.g., what happens when computing the build requirements requires
spawning some child processes, and these children occasionally print an
error message to stdout? obviously a careful build backend author can
avoid this problem, but the most obvious way of defining a Python
interface removes this possibility entirely, because the hook return
value is clearly demarcated).

In general, the need to isolate build backends into their own process
means that we can't remove IPC complexity entirely -- but by placing
both sides of the IPC channel under the control of a single project, we
make it much cheaper to fix bugs in the IPC interface than if fixing
bugs requires coordinated agreement and coordinated changes across the
ecosystem.

Third, and most crucially, the Python hook approach gives us much more
powerful options for evolving this specification in the future.

For concreteness, imagine that next year we add a new
build_sdist_from_vcs hook, which provides an alternative to the current
build_sdist hook where the frontend is responsible for passing version
control tracking metadata to backends (including indicating when all on
disk files are tracked), rather than individual backends having to query
that information themselves. In order to manage the transition, we'd
want it to be possible for build frontends to transparently use
build_sdist_from_vcs when available and fall back onto build_sdist
otherwise; and we'd want it to be possible for build backends to define
both methods, for compatibility with both old and new build frontends.

Furthermore, our mechanism should also fulfill two more goals: (a) If
new versions of e.g. pip and flit are both updated to support the new
interface, then this should be sufficient for it to be used; in
particular, it should not be necessary for every project that uses flit
to update its individual pyproject.toml file. (b) We do not want to have
to spawn extra processes just to perform this negotiation, because
process spawns can easily become a bottleneck when deploying large
multi-package stacks on some platforms (Windows).

In the interface described here, all of these goals are easy to achieve.
Because pip controls the code that runs inside the child process, it can
easily write it to do something like:

    command, backend, args = parse_command_line_args(...)
    if command == "build_sdist":
       if hasattr(backend, "build_sdist_from_vcs"):
           backend.build_sdist_from_vcs(...)
       elif hasattr(backend, "build_sdist"):
           backend.build_sdist(...)
       else:
           # error handling

In the alternative where the public interface boundary is placed at the
subprocess call, this is not possible -- either we need to spawn an
extra process just to query what interfaces are supported (as was
included in an earlier draft of PEP 516, an alternative to this), or
else we give up on autonegotiation entirely (as in the current version
of that PEP), meaning that any changes in the interface will require N
individual packages to update their pyproject.toml files before any
change can go live, and that any changes will necessarily be restricted
to new releases.

One specific consequence of this is that in this PEP, we're able to make
the prepare_metadata_for_build_wheel command optional. In our design,
this can be readily handled by build frontends, which can put code in
their subprocess runner like:

    def dump_wheel_metadata(backend, working_dir):
        """Dumps wheel metadata to working directory.

           Returns absolute path to resulting metadata directory
        """
        if hasattr(backend, "prepare_metadata_for_build_wheel"):
            subdir = backend.prepare_metadata_for_build_wheel(working_dir)
        else:
            wheel_fname = backend.build_wheel(working_dir)
            already_built = os.path.join(working_dir, "ALREADY_BUILT_WHEEL")
            with open(already_built, "w") as f:
                f.write(wheel_fname)
            subdir = unzip_metadata(os.path.join(working_dir, wheel_fname))
        return os.path.join(working_dir, subdir)

    def ensure_wheel_is_built(backend, output_dir, working_dir, metadata_dir):
        """Ensures built wheel is available in output directory

           Returns absolute path to resulting wheel file
        """
        already_built = os.path.join(working_dir, "ALREADY_BUILT_WHEEL")
        if os.path.exists(already_built):
            with open(already_built, "r") as f:
                wheel_fname = f.read().strip()
            working_path = os.path.join(working_dir, wheel_fname)
            final_path = os.path.join(output_dir, wheel_fname)
            os.rename(working_path, final_path)
            os.remove(already_built)
        else:
            wheel_fname = backend.build_wheel(output_dir, metadata_dir=metadata_dir)
        return os.path.join(output_dir, wheel_fname)

and thus expose a totally uniform interface to the rest of the frontend,
with no extra subprocess calls, no duplicated builds, etc. But obviously
this is the kind of code that you only want to write as part of a
private, within-project interface (e.g. the given example requires that
the working directory be shared between the two calls, but not with any
other wheel builds, and that the return value from the metadata helper
function will be passed back in to the wheel building one).

(And, of course, making the metadata command optional is one piece of
lowering the barrier to entry for developing new backends, as discussed
above.)

Other differences

Besides the key command line versus Python hook difference described
above, there are a few other differences in this proposal:

-   Metadata command is optional (as described above).
-   We return metadata as a directory, rather than a single METADATA
    file. This aligns better with the way that in practice wheel
    metadata is distributed across multiple files (e.g. entry points),
    and gives us more options in the future. (For example, instead of
    following the PEP 426 proposal of switching the format of METADATA
    to JSON, we might decide to keep the existing METADATA the way it is
    for backcompat, while adding new extensions as JSON "sidecar" files
    inside the same directory. Or maybe not; the point is it keeps our
    options more open.)
-   We provide a mechanism for passing information between the metadata
    step and the wheel building step. I guess everyone probably will
    agree this is a good idea?
-   We provide more detailed recommendations about the build
    environment, but these aren't normative anyway.

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End: