PEP: 518 Title: Specifying Minimum Build System Requirements for Python
Projects Author: Brett Cannon <brett@python.org>, Nathaniel J. Smith
<njs@pobox.com>, Donald Stufft <donald@stufft.io> BDFL-Delegate: Alyssa
Coghlan Discussions-To: distutils-sig@python.org Status: Final Type:
Standards Track Topic: Packaging Content-Type: text/x-rst Created:
10-May-2016 Post-History: 10-May-2016, 11-May-2016, 13-May-2016
Resolution:
https://mail.python.org/pipermail/distutils-sig/2016-May/028969.html

Abstract

This PEP specifies how Python software packages should specify what
build dependencies they have in order to execute their chosen build
system. As part of this specification, a new configuration file is
introduced for software packages to use to specify their build
dependencies (with the expectation that the same configuration file will
be used for future configuration details).

Rationale

When Python first developed its tooling for building distributions of
software for projects, distutils[1] was the chosen solution. As time
went on, setuptools[2] gained popularity to add some features on top of
distutils. Both used the concept of a setup.py file that project
maintainers executed to build distributions of their software (as well
as users to install said distribution).

Using an executable file to specify build requirements under distutils
isn't an issue as distutils is part of Python's standard library. Having
the build tool as part of Python means that a setup.py has no external
dependency that a project maintainer needs to worry about to build a
distribution of their project. There was no need to specify any
dependency information as the only dependency is Python.

But when a project chooses to use setuptools, the use of an executable
file like setup.py becomes an issue. You can't execute a setup.py file
without knowing its dependencies, but currently there is no standard way
to know what those dependencies are in an automated fashion without
executing the setup.py file where that information is stored. It's a
catch-22 of a file not being runnable without knowing its own contents
which can't be known programmatically unless you run the file.

Setuptools tried to solve this with a setup_requires argument to its
setup() function[3]. This solution has a number of issues, such as:

-   No tooling (besides setuptools itself) can access this information
    without executing the setup.py, but setup.py can't be executed
    without having these items installed.
-   While setuptools itself will install anything listed in this, they
    won't be installed until during the execution of the setup()
    function, which means that the only way to actually use anything
    added here is through increasingly complex machinations that delay
    the import and usage of these modules until later on in the
    execution of the setup() function.
-   This cannot include setuptools itself nor can it include a
    replacement to setuptools, which means that projects such as
    numpy.distutils are largely incapable of utilizing it and projects
    cannot take advantage of newer setuptools features until their users
    naturally upgrade the version of setuptools to a newer one.
-   The items listed in setup_requires get implicitly installed whenever
    you execute the setup.py but one of the common ways that the
    setup.py is executed is via another tool, such as pip, who is
    already managing dependencies. This means that a command like
    pip install spam might end up having both pip and setuptools
    downloading and installing packages and end users needing to
    configure both tools (and for setuptools without being in control of
    the invocation) to change settings like which repository it installs
    from. It also means that users need to be aware of the discovery
    rules for both tools, as one may support different package formats
    or determine the latest version differently.

This has culminated in a situation where use of setup_requires is rare,
where projects tend to either simply copy and paste snippets between
setup.py files or they eschew it all together in favor of simply
documenting elsewhere what they expect the user to have manually
installed prior to attempting to build or install their project.

All of this has led pip[4] to simply assume that setuptools is necessary
when executing a setup.py file. The problem with this, though, is it
doesn't scale if another project began to gain traction in the community
as setuptools has. It also prevents other projects from gaining traction
due to the friction required to use it with a project when pip can't
infer the fact that something other than setuptools is required.

This PEP attempts to rectify the situation by specifying a way to list
the minimal dependencies of the build system of a project in a
declarative fashion in a specific file. This allows a project to list
what build dependencies it has to go from e.g. source checkout to wheel,
while not falling into the catch-22 trap that a setup.py has where
tooling can't infer what a project needs to build itself. Implementing
this PEP will allow projects to specify what build system they depend on
upfront so that tools like pip can make sure that they are installed in
order to run the build system to build the project.

To provide more context and motivation for this PEP, think of the
(rough) steps required to produce a built artifact for a project:

1.  The source checkout of the project.
2.  Installation of the build system.
3.  Execute the build system.

This PEP covers step #2. PEP 517 covers step #3, including how to have
the build system dynamically specify more dependencies that the build
system requires to perform its job. The purpose of this PEP though, is
to specify the minimal set of requirements for the build system to
simply begin execution.

Specification

File Format

The build system dependencies will be stored in a file named
pyproject.toml that is written in the TOML format[5].

This format was chosen as it is human-usable (unlike JSON[6]), it is
flexible enough (unlike configparser[7]), stems from a standard (also
unlike configparser[8]), and it is not overly complex (unlike YAML[9]).
The TOML format is already in use by the Rust community as part of their
Cargo package manager[10] and in private email stated they have been
quite happy with their choice of TOML. A more thorough discussion as to
why various alternatives were not chosen can be read in the Other file
formats section. The authors do realize, though, that choice of
configuration file format is ultimately subjective and a choice had to
be made and the authors prefer TOML for this situation.

Below we list the tables that tools are expected to recognize/respect.
Tables not specified in this PEP are reserved for future use by other
PEPs.

build-system table

The [build-system] table is used to store build-related data. Initially
only one key of the table will be valid and is mandatory for the table:
requires. This key must have a value of a list of strings representing
PEP 508 dependencies required to execute the build system (currently
that means what dependencies are required to execute a setup.py file).

For the vast majority of Python projects that rely upon setuptools, the
pyproject.toml file will be:

    [build-system]
    # Minimum requirements for the build system to execute.
    requires = ["setuptools", "wheel"]  # PEP 508 specifications.

Because the use of setuptools and wheel are so expansive in the
community at the moment, build tools are expected to use the example
configuration file above as their default semantics when a
pyproject.toml file is not present.

Tools should not require the existence of the [build-system] table. A
pyproject.toml file may be used to store configuration details other
than build-related data and thus lack a [build-system] table
legitimately. If the file exists but is lacking the [build-system] table
then the default values as specified above should be used. If the table
is specified but is missing required fields then the tool should
consider it an error.

tool table

The [tool] table is where any tool related to your Python project, not
just build tools, can have users specify configuration data as long as
they use a sub-table within [tool], e.g. the flit tool would store its
configuration in [tool.flit].

We need some mechanism to allocate names within the tool.* namespace, to
make sure that different projects don't attempt to use the same
sub-table and collide. Our rule is that a project can use the subtable
tool.$NAME if, and only if, they own the entry for $NAME in the
Cheeseshop/PyPI.

JSON Schema

To provide a type-specific representation of the resulting data from the
TOML file for illustrative purposes only, the following JSON Schema[11]
would match the data format:

    {
        "$schema": "http://json-schema.org/schema#",

        "type": "object",
        "additionalProperties": false,

        "properties": {
            "build-system": {
                "type": "object",
                "additionalProperties": false,

                "properties": {
                    "requires": {
                        "type": "array",
                        "items": {
                            "type": "string"
                        }
                    }
                },
                "required": ["requires"]
            },

            "tool": {
                "type": "object"
            }
        }
    }

Rejected Ideas

A semantic version key

For future-proofing the structure of the configuration file, a
semantics-version key was initially proposed. Defaulting to 1, the idea
was that if any semantics changes to previously defined keys or tables
occurred which were not backwards-compatible, then the semantics-version
would be incremented to a new number.

In the end, though, it was decided that this was a premature
optimization. The expectation is that changes to what is pre-defined
semantically in the configuration file will be rather conservative. And
in the instances where a backwards-incompatible change would have
occurred, different names can be used for the new semantics to avoid
breaking older tools.

A more nested namespace

An earlier draft of this PEP had a top-level [package] table. The idea
was to impose some scoping for a semantics versioning scheme (see A
semantic version key for why that idea was rejected). With the need for
scoping removed, the point of having a top-level table became
superfluous.

Other table names

Another name proposed for the [build-system] table was [build]. The
alternative name is shorter, but doesn't convey as much of the intention
of what information is stored in the table. After a vote on the
distutils-sig mailing list, the current name won out.

Other file formats

Several other file formats were put forward for consideration, all
rejected for various reasons. Key requirements were that the format be
editable by human beings and have an implementation that can be vendored
easily by projects. This outright excluded certain formats like XML
which are not friendly towards human beings and were never seriously
discussed.

Overview of file formats considered

The key reasons for rejecting the other alternatives considered are
summarised in the following sections, while the full review (including
positive arguments in favour of TOML) can be found at[12].

TOML was ultimately selected as it provided all the features we were
interested in, while avoiding the downsides introduced by the
alternatives.

  Feature                   TOML   YAML   JSON   CFG/INI
  ------------------------- ------ ------ ------ ---------
  Well-defined              yes    yes    yes    
  Real data types           yes    yes    yes    
  Reliable Unicode          yes    yes    yes    
  Reliable comments         yes    yes           
  Easy for humans to edit   yes    ??            ??
  Easy for tools to edit    yes    ??     yes    ??
  In standard library                     yes    yes
  Easy for pip to vendor    yes           n/a    n/a

("??" in the table indicates items where most folks would be inclined to
answer "yes", but there turn out to be a lot of quirks and edge cases
that arise in practice due to either the lack of a clear specification,
or else the underlying file format specification being surprisingly
complicated)

The pytoml TOML parser is ~300 lines of pure Python code, so being
outside the standard library didn't count heavily against it.

Python literals were also discussed as a potential format, but weren't
considered in the file format review (since they're not a common
pre-existing file format).

JSON

The JSON format[13] was initially considered but quickly rejected. While
great as a human-readable, string-based data exchange format, the syntax
does not lend itself to easy editing by a human being (e.g. the syntax
is more verbose than necessary while not allowing for comments).

An example JSON file for the proposed data would be:

    {
        "build": {
            "requires": [
                "setuptools",
                "wheel>=0.27"
            ]
        }
    }

YAML

The YAML format[14] was designed to be a superset of JSON [15] while
being easier to work with by hand. There are three main issues with
YAML.

One is that the specification is large: 86 pages if printed on
letter-sized paper. That leaves the possibility that someone may use a
feature of YAML that works with one parser but not another. It has been
suggested to standardize on a subset, but that basically means creating
a new standard specific to this file which is not tractable long-term.

Two is that YAML itself is not safe by default. The specification allows
for the arbitrary execution of code which is best avoided when dealing
with configuration data. It is of course possible to avoid this behavior
-- for example, PyYAML provides a safe_load operation -- but if any tool
carelessly uses load instead then they open themselves up to arbitrary
code execution. While this PEP is focused on the building of projects
which inherently involves code execution, other configuration data such
as project name and version number may end up in the same file someday
where arbitrary code execution is not desired.

And finally, the most popular Python implementation of YAML is
PyYAML[16] which is a large project of a few thousand lines of code and
an optional C extension module. While in and of itself this isn't
necessarily an issue, this becomes more of a problem for projects like
pip where they would most likely need to vendor PyYAML as a dependency
so as to be fully self-contained (otherwise you end up with your install
tool needing an install tool to work). A proof-of-concept re-working of
PyYAML has been done to see how easy it would be to potentially vendor a
simpler version of the library which shows it is a possibility.

An example YAML file is:

    build:
        requires:
            - setuptools
            - wheel>=0.27

configparser

An INI-style configuration file based on what configparser[17] accepts
was considered. Unfortunately there is no specification of what
configparser accepts, leading to support skew between versions. For
instance, what ConfigParser in Python 2.7 accepts is not the same as
what configparser in Python 3 accepts. While one could standardize on
what Python 3 accepts and simply vendor the backport of the configparser
module, that does mean this PEP would have to codify that the backport
of configparser must be used by all project wishes to consume the
metadata specified by this PEP. This is overly restrictive and could
lead to confusion if someone is not aware of that a specific version of
configparser is expected.

An example INI file is:

    [build]
    requires =
        setuptools
        wheel>=0.27

Python literals

Someone proposed using Python literals as the configuration format. The
file would contain one dict at the top level, with the data all inside
that dict, with sections defined by the keys. All Python programmers
would be used to the format, there would implicitly be no third-party
dependency to read the configuration data, and it can be safe if parsed
by ast.literal_eval()[18]. Python literals can be identical to JSON,
with the added benefit of supporting trailing commas and comments. In
addition, Python's richer data model may be useful for some future
configuration needs (e.g. non-string dict keys, floating point vs.
integer values).

On the other hand, python literals are a Python-specific format, and it
is anticipated that these data may need to be read by packaging tools,
etc. that are not written in Python.

An example Python literal file for the proposed data would be:

    # The build configuration
    {"build": {"requires": ["setuptools",
                            "wheel>=0.27", # note the trailing comma
                            # "numpy>=1.10" # a commented out data line
                            ]
    # and here is an arbitrary comment.
               }
     }

Sticking with setup.cfg

There are two issues with setup.cfg used by setuptools as a general
format. One is that they are .ini files which have issues as mentioned
in the configparser discussion above. The other is that the schema for
that file has never been rigorously defined and thus it's unknown which
format would be safe to use going forward without potentially confusing
setuptools installations.

Other file names

Several other file names were considered and rejected (although this is
very much a bikeshedding topic, and so the decision comes down to mostly
taste).

pysettings.toml

    Most reasonable alternative.

pypa.toml

    While it makes sense to reference the PyPA[19], it is a somewhat
    niche term. It's better to have the file name make sense without
    having domain-specific knowledge.

pybuild.toml

    From the restrictive perspective of this PEP this filename makes
    sense, but if any non-build metadata ever gets added to the file
    then the name ceases to make sense.

pip.toml

    Too tool-specific.

meta.toml

    Too generic; project may want to have its own metadata file.

setup.toml

    While keeping with traditional thanks to setup.py, it does not
    necessarily match what the file may contain in the future (e.g. is
    knowing the name of a project inherently part of its setup?).

pymeta.toml

    Not obvious to newcomers to programming and/or Python.

pypackage.toml & pypackaging.toml

    Name conflation of what a "package" is (project versus namespace).

pydevelop.toml

    The file may contain details not specific to development.

pysource.toml

    Not directly related to source code.

pytools.toml

    Misleading as the file is (currently) aimed at project management.

dstufft.toml

    Too person-specific. ;)

References

Copyright

This document has been placed in the public domain.

[1] distutils
(https://docs.python.org/3/library/distutils.html#module-distutils)

[2] setuptools (https://pypi.python.org/pypi/setuptools)

[3] setuptools: New and Changed setup() Keywords
(http://pythonhosted.org/setuptools/setuptools.html#new-and-changed-setup-keywords)

[4] pip (https://pypi.python.org/pypi/pip)

[5] TOML (https://github.com/toml-lang/toml)

[6] JSON (http://json.org/)

[7] configparser
(https://docs.python.org/3/library/configparser.html#module-configparser)

[8] configparser
(https://docs.python.org/3/library/configparser.html#module-configparser)

[9] YAML (http://yaml.org/)

[10] Cargo, Rust's package manager (http://doc.crates.io/)

[11] JSON Schema (http://json-schema.org/)

[12] Nathaniel J. Smith's file format review
(https://gist.github.com/njsmith/78f68204c5d969f8c8bc645ef77d4a8f)

[13] JSON (http://json.org/)

[14] YAML (http://yaml.org/)

[15] JSON (http://json.org/)

[16] PyYAML (https://pypi.python.org/pypi/PyYAML)

[17] configparser
(https://docs.python.org/3/library/configparser.html#module-configparser)

[18] ast.literal_eval()
(https://docs.python.org/3/library/ast.html#ast.literal_eval)

[19] PyPA (https://www.pypa.io)