PEP 518 – Specifying Minimum Build System Requirements for Python Projects
- Author:
- Brett Cannon <brett at python.org>, Nathaniel J. Smith <njs at pobox.com>, Donald Stufft <donald at stufft.io>
- BDFL-Delegate:
- Alyssa Coghlan
- Discussions-To:
- Distutils-SIG list
- Status:
- Final
- Type:
- Standards Track
- Topic:
- Packaging
- Created:
- 10-May-2016
- Post-History:
- 10-May-2016, 11-May-2016, 13-May-2016
- Resolution:
- Distutils-SIG message
Table of Contents
Abstract
This PEP specifies how Python software packages should specify what build dependencies they have in order to execute their chosen build system. As part of this specification, a new configuration file is introduced for software packages to use to specify their build dependencies (with the expectation that the same configuration file will be used for future configuration details).
Rationale
When Python first developed its tooling for building distributions of
software for projects, distutils [1] was the chosen
solution. As time went on, setuptools [2] gained popularity
to add some features on top of distutils. Both used the concept of a
setup.py
file that project maintainers executed to build
distributions of their software (as well as users to install said
distribution).
Using an executable file to specify build requirements under distutils
isn’t an issue as distutils is part of Python’s standard library.
Having the build tool as part of Python means that a setup.py
has
no external dependency that a project maintainer needs to worry about
to build a distribution of their project. There was no need to specify
any dependency information as the only dependency is Python.
But when a project chooses to use setuptools, the use of an executable
file like setup.py
becomes an issue. You can’t execute a
setup.py
file without knowing its dependencies, but currently
there is no standard way to know what those dependencies are in an
automated fashion without executing the setup.py
file where that
information is stored. It’s a catch-22 of a file not being runnable
without knowing its own contents which can’t be known programmatically
unless you run the file.
Setuptools tried to solve this with a setup_requires
argument to
its setup()
function [3]. This solution has a number
of issues, such as:
- No tooling (besides setuptools itself) can access this information
without executing the
setup.py
, butsetup.py
can’t be executed without having these items installed. - While setuptools itself will install anything listed in this, they
won’t be installed until during the execution of the
setup()
function, which means that the only way to actually use anything added here is through increasingly complex machinations that delay the import and usage of these modules until later on in the execution of thesetup()
function. - This cannot include
setuptools
itself nor can it include a replacement tosetuptools
, which means that projects such asnumpy.distutils
are largely incapable of utilizing it and projects cannot take advantage of newer setuptools features until their users naturally upgrade the version of setuptools to a newer one. - The items listed in
setup_requires
get implicitly installed whenever you execute thesetup.py
but one of the common ways that thesetup.py
is executed is via another tool, such aspip
, who is already managing dependencies. This means that a command likepip install spam
might end up having both pip and setuptools downloading and installing packages and end users needing to configure both tools (and forsetuptools
without being in control of the invocation) to change settings like which repository it installs from. It also means that users need to be aware of the discovery rules for both tools, as one may support different package formats or determine the latest version differently.
This has culminated in a situation where use of setup_requires
is rare, where projects tend to either simply copy and paste snippets
between setup.py
files or they eschew it all together in favor
of simply documenting elsewhere what they expect the user to have
manually installed prior to attempting to build or install their
project.
All of this has led pip [4] to simply assume that setuptools is
necessary when executing a setup.py
file. The problem with this,
though, is it doesn’t scale if another project began to gain traction
in the community as setuptools has. It also prevents other projects
from gaining traction due to the friction required to use it with a
project when pip can’t infer the fact that something other than
setuptools is required.
This PEP attempts to rectify the situation by specifying a way to list
the minimal dependencies of the build system of a project in a
declarative fashion in a specific file. This allows a project to list
what build dependencies it has to go from e.g. source checkout to
wheel, while not falling into the catch-22 trap that a setup.py
has where tooling can’t infer what a project needs to build itself.
Implementing this PEP will allow projects to specify what build system
they depend on upfront so that tools like pip can make sure that they
are installed in order to run the build system to build the project.
To provide more context and motivation for this PEP, think of the (rough) steps required to produce a built artifact for a project:
- The source checkout of the project.
- Installation of the build system.
- Execute the build system.
This PEP covers step #2. PEP 517 covers step #3, including how to have the build system dynamically specify more dependencies that the build system requires to perform its job. The purpose of this PEP though, is to specify the minimal set of requirements for the build system to simply begin execution.
Specification
File Format
The build system dependencies will be stored in a file named
pyproject.toml
that is written in the TOML format [6].
This format was chosen as it is human-usable (unlike JSON [7]), it is flexible enough (unlike configparser [9]), stems from a standard (also unlike configparser [9]), and it is not overly complex (unlike YAML [8]). The TOML format is already in use by the Rust community as part of their Cargo package manager [14] and in private email stated they have been quite happy with their choice of TOML. A more thorough discussion as to why various alternatives were not chosen can be read in the Other file formats section. The authors do realize, though, that choice of configuration file format is ultimately subjective and a choice had to be made and the authors prefer TOML for this situation.
Below we list the tables that tools are expected to recognize/respect. Tables not specified in this PEP are reserved for future use by other PEPs.
build-system table
The [build-system]
table is used to store build-related data.
Initially only one key of the table will be valid and is mandatory
for the table: requires
. This key must have a value of a list
of strings representing PEP 508 dependencies required to execute the
build system (currently that means what dependencies are required to
execute a setup.py
file).
For the vast majority of Python projects that rely upon setuptools,
the pyproject.toml
file will be:
[build-system]
# Minimum requirements for the build system to execute.
requires = ["setuptools", "wheel"] # PEP 508 specifications.
Because the use of setuptools and wheel are so expansive in the
community at the moment, build tools are expected to use the example
configuration file above as their default semantics when a
pyproject.toml
file is not present.
Tools should not require the existence of the [build-system]
table.
A pyproject.toml
file may be used to store configuration details
other than build-related data and thus lack a [build-system]
table
legitimately. If the file exists but is lacking the [build-system]
table then the default values as specified above should be used.
If the table is specified but is missing required fields then the tool
should consider it an error.
tool table
The [tool]
table is where any tool related to your Python
project, not just build tools, can have users specify configuration
data as long as they use a sub-table within [tool]
, e.g. the
flit tool would store its
configuration in [tool.flit]
.
We need some mechanism to allocate names within the tool.*
namespace, to make sure that different projects don’t attempt to use
the same sub-table and collide. Our rule is that a project can use
the subtable tool.$NAME
if, and only if, they own the entry for
$NAME
in the Cheeseshop/PyPI.
JSON Schema
To provide a type-specific representation of the resulting data from the TOML file for illustrative purposes only, the following JSON Schema [15] would match the data format:
{
"$schema": "http://json-schema.org/schema#",
"type": "object",
"additionalProperties": false,
"properties": {
"build-system": {
"type": "object",
"additionalProperties": false,
"properties": {
"requires": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": ["requires"]
},
"tool": {
"type": "object"
}
}
}
Rejected Ideas
A semantic version key
For future-proofing the structure of the configuration file, a
semantics-version
key was initially proposed. Defaulting to 1
,
the idea was that if any semantics changes to previously defined keys
or tables occurred which were not backwards-compatible, then the
semantics-version
would be incremented to a new number.
In the end, though, it was decided that this was a premature optimization. The expectation is that changes to what is pre-defined semantically in the configuration file will be rather conservative. And in the instances where a backwards-incompatible change would have occurred, different names can be used for the new semantics to avoid breaking older tools.
A more nested namespace
An earlier draft of this PEP had a top-level [package]
table. The
idea was to impose some scoping for a semantics versioning scheme
(see A semantic version key for why that idea was rejected).
With the need for scoping removed, the point of having a top-level
table became superfluous.
Other table names
Another name proposed for the [build-system]
table was
[build]
. The alternative name is shorter, but doesn’t convey as
much of the intention of what information is stored in the table. After
a vote on the distutils-sig mailing list, the current name won out.
Other file formats
Several other file formats were put forward for consideration, all rejected for various reasons. Key requirements were that the format be editable by human beings and have an implementation that can be vendored easily by projects. This outright excluded certain formats like XML which are not friendly towards human beings and were never seriously discussed.
Overview of file formats considered
The key reasons for rejecting the other alternatives considered are summarised in the following sections, while the full review (including positive arguments in favour of TOML) can be found at [16].
TOML was ultimately selected as it provided all the features we were interested in, while avoiding the downsides introduced by the alternatives.
Feature | TOML | YAML | JSON | CFG/INI |
---|---|---|---|---|
Well-defined | yes | yes | yes | |
Real data types | yes | yes | yes | |
Reliable Unicode | yes | yes | yes | |
Reliable comments | yes | yes | ||
Easy for humans to edit | yes | ?? | ?? | |
Easy for tools to edit | yes | ?? | yes | ?? |
In standard library | yes | yes | ||
Easy for pip to vendor | yes | n/a | n/a |
(“??” in the table indicates items where most folks would be inclined to answer “yes”, but there turn out to be a lot of quirks and edge cases that arise in practice due to either the lack of a clear specification, or else the underlying file format specification being surprisingly complicated)
The pytoml
TOML parser is ~300 lines of pure Python code,
so being outside the standard library didn’t count heavily
against it.
Python literals were also discussed as a potential format, but weren’t considered in the file format review (since they’re not a common pre-existing file format).
JSON
The JSON format [7] was initially considered but quickly rejected. While great as a human-readable, string-based data exchange format, the syntax does not lend itself to easy editing by a human being (e.g. the syntax is more verbose than necessary while not allowing for comments).
An example JSON file for the proposed data would be:
{
"build": {
"requires": [
"setuptools",
"wheel>=0.27"
]
}
}
YAML
The YAML format [8] was designed to be a superset of JSON [7] while being easier to work with by hand. There are three main issues with YAML.
One is that the specification is large: 86 pages if printed on letter-sized paper. That leaves the possibility that someone may use a feature of YAML that works with one parser but not another. It has been suggested to standardize on a subset, but that basically means creating a new standard specific to this file which is not tractable long-term.
Two is that YAML itself is not safe by default. The specification
allows for the arbitrary execution of code which is best avoided when
dealing with configuration data. It is of course possible to avoid
this behavior – for example, PyYAML provides a safe_load
operation
– but if any tool carelessly uses load
instead then they open
themselves up to arbitrary code execution. While this PEP is focused on
the building of projects which inherently involves code execution,
other configuration data such as project name and version number may
end up in the same file someday where arbitrary code execution is not
desired.
And finally, the most popular Python implementation of YAML is PyYAML [10] which is a large project of a few thousand lines of code and an optional C extension module. While in and of itself this isn’t necessarily an issue, this becomes more of a problem for projects like pip where they would most likely need to vendor PyYAML as a dependency so as to be fully self-contained (otherwise you end up with your install tool needing an install tool to work). A proof-of-concept re-working of PyYAML has been done to see how easy it would be to potentially vendor a simpler version of the library which shows it is a possibility.
An example YAML file is:
build:
requires:
- setuptools
- wheel>=0.27
configparser
An INI-style configuration file based on what configparser [9] accepts was considered. Unfortunately there is no specification of what configparser accepts, leading to support skew between versions. For instance, what ConfigParser in Python 2.7 accepts is not the same as what configparser in Python 3 accepts. While one could standardize on what Python 3 accepts and simply vendor the backport of the configparser module, that does mean this PEP would have to codify that the backport of configparser must be used by all project wishes to consume the metadata specified by this PEP. This is overly restrictive and could lead to confusion if someone is not aware of that a specific version of configparser is expected.
An example INI file is:
[build]
requires =
setuptools
wheel>=0.27
Python literals
Someone proposed using Python literals as the configuration format.
The file would contain one dict at the top level, with the data all
inside that dict, with sections defined by the keys. All Python
programmers would be used to the format, there would implicitly be no
third-party dependency to read the configuration data, and it can be
safe if parsed by ast.literal_eval()
[13].
Python literals can be identical to JSON, with the added benefit of
supporting trailing commas and comments. In addition, Python’s richer
data model may be useful for some future configuration needs (e.g. non-string
dict keys, floating point vs. integer values).
On the other hand, python literals are a Python-specific format, and it is anticipated that these data may need to be read by packaging tools, etc. that are not written in Python.
An example Python literal file for the proposed data would be:
# The build configuration
{"build": {"requires": ["setuptools",
"wheel>=0.27", # note the trailing comma
# "numpy>=1.10" # a commented out data line
]
# and here is an arbitrary comment.
}
}
Sticking with setup.cfg
There are two issues with setup.cfg
used by setuptools as a general
format. One is that they are .ini
files which have issues as mentioned
in the configparser discussion above. The other is that the schema for
that file has never been rigorously defined and thus it’s unknown which
format would be safe to use going forward without potentially confusing
setuptools installations.
Other file names
Several other file names were considered and rejected (although this is very much a bikeshedding topic, and so the decision comes down to mostly taste).
- pysettings.toml
- Most reasonable alternative.
- pypa.toml
- While it makes sense to reference the PyPA [11], it is a somewhat niche term. It’s better to have the file name make sense without having domain-specific knowledge.
- pybuild.toml
- From the restrictive perspective of this PEP this filename makes sense, but if any non-build metadata ever gets added to the file then the name ceases to make sense.
- pip.toml
- Too tool-specific.
- meta.toml
- Too generic; project may want to have its own metadata file.
- setup.toml
- While keeping with traditional thanks to
setup.py
, it does not necessarily match what the file may contain in the future (e.g. is knowing the name of a project inherently part of its setup?). - pymeta.toml
- Not obvious to newcomers to programming and/or Python.
- pypackage.toml & pypackaging.toml
- Name conflation of what a “package” is (project versus namespace).
- pydevelop.toml
- The file may contain details not specific to development.
- pysource.toml
- Not directly related to source code.
- pytools.toml
- Misleading as the file is (currently) aimed at project management.
- dstufft.toml
- Too person-specific. ;)
References
Copyright
This document has been placed in the public domain.
Source: https://github.com/python/peps/blob/main/peps/pep-0518.rst
Last modified: 2023-10-11 12:05:51 GMT