PEP 639 – Improving License Clarity with Better Package Metadata
- Author:
- Philippe Ombredanne <pombredanne at nexb.com>, C.A.M. Gerlach <CAM.Gerlach at Gerlach.CAM>, Karolina Surma <karolina.surma at gazeta.pl>
- PEP-Delegate:
- Brett Cannon <brett at python.org>
- Discussions-To:
- Discourse thread
- Status:
- Provisional
- Type:
- Standards Track
- Topic:
- Packaging
- Created:
- 15-Aug-2019
- Post-History:
- 15-Aug-2019, 17-Dec-2021, 10-May-2024
- Resolution:
- Discourse message
Table of Contents
- Provisional Acceptance
- Abstract
- Goals
- Non-Goals
- Motivation
- Rationale
- Terminology
- Specification
- Backwards Compatibility
- Security Implications
- How to Teach This
- Reference Implementation
- Rejected Ideas
- Appendices
- Acknowledgments
- Copyright
Provisional Acceptance
This PEP has been provisionally accepted, with the following required conditions before the PEP is made Final:
- An implementation of the PEP in two build back-ends.
- An implementation of the PEP in PyPI.
Abstract
This PEP defines a specification how licenses are documented in the Python projects.
To achieve that, it:
- Adopts the SPDX license expression syntax as a means of expressing the license for a Python project.
- Defines how to include license files within the projects, source and built distributions.
- Specifies the necessary changes to Core Metadata and the corresponding Pyproject Metadata keys
- Describes the necessary changes to the source distribution (sdist), built distribution (wheel) and installed project standards.
This will make license declaration simpler and less ambiguous for package authors to create, end users to understand, and tools to programmatically process.
The changes will update the Core Metadata specification to version 2.4.
Goals
This PEP’s scope is limited to covering new mechanisms for documenting the license of a distribution package, specifically defining:
- A means of specifying a SPDX license expression.
- A method of including license texts in distribution packages and installed Projects.
The changes that this PEP requires have been designed to minimize impact and maximize backward compatibility.
Non-Goals
This PEP doesn’t recommend any particular license to be chosen by any particular package author.
If projects decide not to use the new fields, no additional restrictions are imposed by this PEP when uploading to PyPI.
This PEP also is not about license documentation for individual files, though this is a surveyed topic in an appendix, nor does it intend to cover cases where the source distribution and binary distribution packages don’t have the same licenses.
Motivation
Software must be licensed in order for anyone other than its creator to download, use, share and modify it. Today, there are multiple fields where licenses are documented in Core Metadata, and there are limitations to what can be expressed in each of them. This often leads to confusion both for package authors and end users, including distribution re-packagers.
This has triggered a number of license-related discussions and issues, including on outdated and ambiguous PyPI classifiers, license interoperability with other ecosystems, too many confusing license metadata options, limited support for license files in the Wheel project, and the lack of precise license metadata.
As a result, on average, Python packages tend to have more ambiguous and missing license information than other common ecosystems. This is supported by the statistics page of the ClearlyDefined project, an Open Source Initiative effort to help improve licensing clarity of other FOSS projects, covering all packages from PyPI, Maven, npm and Rubygems.
The current license classifiers could be extended to include the full range of
the SPDX identifiers while deprecating the ambiguous classifiers
(such as License :: OSI Approved :: BSD License
).
However, there are multiple arguments against such an approach:
- It requires a great effort to duplicate the SPDX license list and keep it in sync.
- It is a hard break in backward compatibility, forcing package authors to update to new classifiers immediately when PyPI deprecates the old ones.
- It only covers packages under a single license; it doesn’t address projects that vendor dependencies (e.g. Setuptools), offer a choice of licenses (e.g. Packaging) or were relicensed, adapt code from other projects or contain fonts, images, examples, binaries or other assets under other licenses.
- It requires both authors and tools understand and implement the PyPI-specific classifier system.
- It does not provide as clear an indicator that a package has adopted the new system, and should be treated accordingly.
Rationale
A survey was conducted to map the existing license metadata definitions in the Python ecosystem and a variety of other packaging systems, Linux distributions, language ecosystems and applications.
The takeaways from the survey have guided the recommendations of this PEP:
- SPDX and SPDX-like syntaxes are the most popular license expressions in many modern package systems.
- Most Free and Open Source Software licenses require package authors to include their full text in a Distribution Package.
Therefore, this PEP introduces two new Core Metadata fields:
- License-Expression that provides an unambiguous way to express the license of a package using SPDX license expressions.
- License-File that offers a standardized way to include the full text of the license(s) with the package when distributed, and allows other tools consuming the Core Metadata to locate a distribution archive’s license files.
Furthermore, this specification builds upon existing practice in the Setuptools and Wheel projects. An up-to-date version of the current draft of this PEP is implemented in the Hatch packaging tool, and an earlier draft of the license files portion is implemented in Setuptools.
Terminology
The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.
License terms
The license-related terminology draws heavily from the SPDX Project, particularly license identifier and license expression.
- license classifier
- A PyPI Trove classifier
(as described
in the Core Metadata specification)
which begins with
License ::
. - license expression
- SPDX expression
- A string with valid SPDX license expression syntax
including one or more SPDX license identifier(s),
which describes a Project’s license(s)
and how they inter-relate.
Examples:
GPL-3.0-or-later
,MIT AND (Apache-2.0 OR BSD-2-clause)
- license identifier
- SPDX identifier
- A valid SPDX short-form license identifier,
as described in the
Add License-Expression field section of this PEP.
This includes all valid SPDX identifiers and
the custom
LicenseRef-[idstring]
strings conforming to the SPDX specification, clause 10.1. Examples:MIT
,GPL-3.0-only
,LicenseRef-My-Custom-License
- root license directory
- license directory
- The directory under which license files are stored in a
project source tree, distribution archive
or installed project.
Also, the root directory that their paths
recorded in the License-File
Core Metadata field are relative to.
Defined to be the project root directory
for a project source tree or
source distribution;
and a subdirectory named
licenses
of the directory containing the built metadata— i.e., the.dist-info/licenses
directory— for a Built Distribution or installed project.
Specification
The changes necessary to implement this PEP include:
- additions to Core Metadata, as defined in the specification.
- additions to the author-provided project source metadata, as defined in the specification.
- additions to the source distribution (sdist), built distribution (wheel) and installed project specifications.
- guide for tools handling and converting legacy license metadata to license expressions, to ensure the results are consistent and correct.
Note that the guidance on errors and warnings is for tools’ default behavior; they MAY operate more strictly if users explicitly configure them to do so, such as by a CLI flag or a configuration option.
SPDX license expression syntax
This PEP adopts the SPDX license expression syntax as documented in the SPDX specification, either Version 2.2 or a later compatible version.
A license expression can use the following license identifiers:
- Any SPDX-listed license short-form identifiers that are published in the SPDX License List, version 3.17 or any later compatible version. Note that the SPDX working group never removes any license identifiers; instead, they may choose to mark an identifier as “deprecated”.
- The custom
LicenseRef-[idstring]
string(s), where[idstring]
is a unique string containing letters, numbers,.
and/or-
, to identify licenses that are not included in the SPDX license list. The custom identifiers must follow the SPDX specification, clause 10.1 of the given specification version.
Examples of valid SPDX expressions:
MIT
BSD-3-Clause
MIT AND (Apache-2.0 OR BSD-2-Clause)
MIT OR GPL-2.0-or-later OR (FSFUL AND BSD-2-Clause)
GPL-3.0-only WITH Classpath-Exception-2.0 OR BSD-3-Clause
LicenseRef-Special-License OR CC0-1.0 OR Unlicense
LicenseRef-Proprietary
Examples of invalid SPDX expressions:
Use-it-after-midnight
Apache-2.0 OR 2-BSD-Clause
LicenseRef-License with spaces
LicenseRef-License_with_underscores
Core Metadata
The error and warning guidance in this section applies to build and publishing tools; end-user-facing install tools MAY be less strict than mentioned here when encountering malformed metadata that does not conform to this specification.
As it adds new fields, this PEP updates the Core Metadata version to 2.4.
Add License-Expression
field
The License-Expression
optional Core Metadata field
is specified to contain a text string
that is a valid SPDX license expression,
as defined above.
Build and publishing tools SHOULD
check that the License-Expression
field contains a valid SPDX expression,
including the validity of the particular license identifiers
(as defined above).
Tools MAY halt execution and raise an error when an invalid expression is found.
If tools choose to validate the SPDX expression, they also SHOULD
store a case-normalized version of the License-Expression
field using the reference case for each SPDX license identifier and uppercase
for the AND
, OR
and WITH
keywords.
Tools SHOULD report a warning and publishing tools MAY raise an error
if one or more license identifiers
have been marked as deprecated in the SPDX License List.
For all newly-uploaded distribution archives
that include a License-Expression
field,
the Python Package Index (PyPI) MUST
validate that they contain a valid, case-normalized license expression with
valid identifiers (as defined above)
and MUST reject uploads that do not.
Custom license identifiers which conform to the SPDX specification
are considered valid.
PyPI MAY reject an upload for using a deprecated license identifier,
so long as it was deprecated as of the above-mentioned SPDX License List
version.
Add License-File
field
License-File
is an optional Core Metadata field.
Each instance contains the string
representation of the path of a license-related file. The path is located
within the project source tree, relative to the
project root directory.
It is a multi-use field that may appear zero or
more times and each instance lists the path to one such file. Files specified
under this field could include license text, author/attribution information,
or other legal notices that need to be distributed with the package.
As specified by this PEP, its value is also that file’s path relative to the root license directory in both installed projects and the standardized Distribution Package types.
If a License-File
is listed in a
Source Distribution or
Built Distribution’s Core Metadata:
- That file MUST be included in the distribution archive at the specified path relative to the root license directory.
- That file MUST be installed with the project at that same relative path.
- The specified relative path MUST be consistent between project source trees, source distributions (sdists), built distributions (Wheels) and installed projects.
- Inside the root license directory, packaging tools MUST reproduce the directory structure under which the source license files are located relative to the project root.
- Path delimiters MUST be the forward slash character (
/
), and parent directory indicators (..
) MUST NOT be used. - License file content MUST be UTF-8 encoded text.
Build tools MAY and publishing tools SHOULD produce an informative warning
if a built distribution’s metadata contains no License-File
entries,
and publishing tools MAY but build tools MUST NOT raise an error.
For all newly-uploaded distribution archives that include one or more
License-File
fields in their Core Metadata
and declare a Metadata-Version
of 2.4
or higher,
PyPI SHOULD validate that all specified files are present in that
distribution archives,
and MUST reject uploads that do not validate.
Deprecate License
field
The legacy unstructured-text License
Core Metadata field
is deprecated and replaced by the new License-Expression
field.
The fields are mutually exclusive.
Tools which generate Core Metadata MUST NOT create both these fields.
Tools which read Core Metadata, when dealing with both these fields present
at the same time, MUST read the value of License-Expression
and MUST
disregard the value of the License
field.
If only the License
field is present, tools MAY issue a warning
informing users it is deprecated and recommending License-Expression
instead.
For all newly-uploaded distribution archives that include a
License-Expression
field, the Python Package Index (PyPI) MUST
reject any that specify both License
and License-Expression
fields.
The License
field may be removed from a new version of the specification
in a future PEP.
Deprecate license classifiers
Using license classifiers
in the Classifier
Core Metadata field
(described in the Core Metadata specification)
is deprecated and replaced by the more precise License-Expression
field.
If the License-Expression
field is present, build tools MAY raise an error
if one or more license classifiers
is included in a Classifier
field, and MUST NOT add
such classifiers themselves.
Otherwise, if this field contains a license classifier,
tools MAY issue a warning informing users such classifiers
are deprecated, and recommending License-Expression
instead.
For compatibility with existing publishing and installation processes,
the presence of license classifiers SHOULD NOT raise an error unless
License-Expression
is also provided.
New license classifiers MUST NOT be added to PyPI;
users needing them SHOULD use the License-Expression
field instead.
License classifiers may be removed from a new version of the specification
in a future PEP.
Project source metadata
This PEP specifies changes to the project’s source
metadata under a [project]
table in the pyproject.toml
file.
Add string value to license
key
license
key in the [project]
table is defined to contain a top-level
string value. It is a valid SPDX license expression as
defined in this PEP.
Its value maps to the License-Expression
field in the core metadata.
Build tools SHOULD validate and perform case normalization of the expression as described in the Add License-Expression field section, outputting an error or warning as specified.
Examples:
[project]
license = "MIT"
[project]
license = "MIT AND (Apache-2.0 OR BSD-2-clause)"
[project]
license = "MIT OR GPL-2.0-or-later OR (FSFUL AND BSD-2-Clause)"
[project]
license = "LicenseRef-Proprietary"
Add license-files
key
A new license-files
key is added to the [project]
table for specifying
paths in the project source tree relative to pyproject.toml
to file(s)
containing licenses and other legal notices to be distributed with the package.
It corresponds to the License-File
fields in the Core Metadata.
Its value is an array of strings which MUST contain valid glob patterns, as specified below:
- Alphanumeric characters, underscores (
_
), hyphens (-
) and dots (.
) MUST be matched verbatim. - Special glob characters:
*
,?
,**
and character ranges:[]
containing only the verbatim matched characters MUST be supported. Within[...]
, the hyphen indicates a locale-agnostic range (e.g.a-z
, order based on Unicode code points). Hyphens at the start or end are matched literally. - Path delimiters MUST be the forward slash character (
/
). Patterns are relative to the directory containingpyproject.toml
, therefore the leading slash character MUST NOT be used. - Parent directory indicators (
..
) MUST NOT be used.
Any characters or character sequences not covered by this specification are invalid. Projects MUST NOT use such values. Tools consuming this field SHOULD reject invalid values with an error.
Tools MUST assume that license file content is valid UTF-8 encoded text, and SHOULD validate this and raise an error if it is not.
Literal paths (e.g. LICENSE
) are treated as valid globs which means they
can also be defined.
Build tools:
- MUST treat each value as a glob pattern, and MUST raise an error if the pattern contains invalid glob syntax.
- MUST include all files matched by a listed pattern in all distribution archives.
- MUST list each matched file path under a
License-File
field in the Core Metadata. - MUST raise an error if any individual user-specified pattern does not match at least one file.
If the license-files
key is present and
is set to a value of an empty array, then tools MUST NOT include any
license files and MUST NOT raise an error.
Examples of valid license files declaration:
[project]
license-files = ["LICEN[CS]E*", "AUTHORS*"]
[project]
license-files = ["licenses/LICENSE.MIT", "licenses/LICENSE.CC0"]
[project]
license-files = ["LICENSE.txt", "licenses/*"]
[project]
license-files = []
Examples of invalid license files declaration:
[project]
license-files = ["..\LICENSE.MIT"]
Reason: ..
must not be used.
\
is an invalid path delimiter, /
must be used.
[project]
license-files = ["LICEN{CSE*"]
Reason: “LICEN{CSE*” is not a valid glob.
Deprecate license
key table subkeys
Table values for the license
key in the [project]
table,
including the text
and file
table subkeys, are now deprecated.
If the new license-files
key is present,
build tools MUST raise an error if the license
key is defined
and has a value other than a single top-level string.
If the new license-files
key is not present
and the text
subkey is present in a license
table,
tools SHOULD issue a warning informing users it is deprecated
and recommending a license expression as a top-level string key instead.
Likewise, if the new license-files
key is not present
and the file
subkey is present in the license
table,
tools SHOULD issue a warning informing users it is deprecated and recommending
the license-files
key instead.
If the specified license file
is present in the source tree,
build tools SHOULD use it to fill the License-File
field
in the core metadata, and MUST include the specified file
as if it were specified in a license-file
field.
If the file does not exist at the specified path,
tools MUST raise an informative error as previously specified.
Table values for the license
key MAY be removed
from a new version of the specification in a future PEP.
License files in project formats
A few additions will be made to the existing specifications.
- Project source trees
- Per Project source metadata section, the
Declaring Project Metadata specification
will be updated to reflect that license file paths MUST be relative to the
project root directory; i.e. the directory containing the
pyproject.toml
(or equivalently, other legacy project configuration, e.g.setup.py
,setup.cfg
, etc). - Source distributions (sdists)
- The sdist specification will be updated to reflect that if
the
Metadata-Version
is2.4
or greater, the sdist MUST contain any license files specified by the License-File field in thePKG-INFO
at their respective paths relative to the of the sdist (containing thepyproject.toml
and thePKG-INFO
Core Metadata). - Built distributions (wheels)
- The Wheel specification will be updated to reflect that if
the
Metadata-Version
is2.4
or greater and one or moreLicense-File
fields is specified, the.dist-info
directory MUST contain alicenses
subdirectory, which MUST contain the files listed in theLicense-File
fields in theMETADATA
file at their respective paths relative to thelicenses
directory. - Installed projects
- The Recording Installed Projects specification will be
updated to reflect that if the
Metadata-Version
is2.4
or greater and one or moreLicense-File
fields is specified, the.dist-info
directory MUST contain alicenses
subdirectory which MUST contain the files listed in theLicense-File
fields in theMETADATA
file at their respective paths relative to thelicenses
directory, and that any files in this directory MUST be copied from wheels by install tools.
Converting legacy metadata
Tools MUST NOT use the contents of the license.text
[project]
key
(or equivalent tool-specific format),
license classifiers or the value of the Core Metadata License
field
to fill the top-level string value of the license
key
or the Core Metadata License-Expression
field
without informing the user and requiring unambiguous, affirmative user action
to select and confirm the desired license expression value before proceeding.
Tool authors, who need to automatically convert license classifiers to SPDX identifiers, can use the recommendation prepared by the PEP authors.
Backwards Compatibility
Adding a new License-Expression
Core Metadata field and a top-level string
value for the license
key in the pyproject.toml
[project]
table
unambiguously means support for the specification in this PEP. This avoids the
risk of new tooling misinterpreting a license expression as a free-form license
description or vice versa.
The legacy deprecated Core Metadata License
field, license
key table
subkeys (text
and file
) in the pyproject.toml
[project]
table
and license classifiers retain backwards compatibility. A removal is
left to a future PEP and a new version of the Core Metadata specification.
Specification of the new License-File
Core Metadata field and adding the
files in the distribution is designed to be largely backwards-compatible with
the existing use of that field in many packaging tools.
The new license-files
key in the [project]
table of
pyproject.toml
will only have an effect once users and tools adopt it.
This PEP specifies that license files should be placed in a dedicated
licenses
subdir of .dist-info
directory. This is new and ensures that
wheels following this PEP will have differently-located licenses relative to
those produced via the previous installer-specific behavior. This is further
supported by a new metadata version.
This also resolves current issues where license files are accidentally replaced if they have the same names in different places, making wheels undistributable without noticing. It also prevents conflicts with other metadata files in the same directory.
The additions will be made to the source distribution (sdist), built distribution (wheel) and installed project specifications. They document behaviors allowed under their current specifications, and gate them behind the new metadata version.
This PEP proposes PyPI implement validation of the new
License-Expression
and License-File
fields, which has no effect on
new and existing packages uploaded unless they explicitly opt in to using
these new fields and fail to follow the specification correctly.
Therefore, this does not have a backward compatibility impact, and guarantees
forward compatibility by ensuring all distributions uploaded to PyPI with the
new fields conform to the specification.
Security Implications
This PEP has no foreseen security implications: the License-Expression
field is a plain string and the License-File
fields are file paths.
Neither introduces any known new security concerns.
How to Teach This
A majority of packages use a single license which makes the case simple: a single license identifier is a valid license expression.
Users of packaging tools will learn the valid license expression of their
package through the messages issued by the tools when they detect invalid
ones, or when the deprecated License
field or license classifiers are used.
If an invalid License-Expression
is used, the users will not be able
to publish their package to PyPI and an error message will help them
understand they need to use SPDX identifiers.
It will be possible to generate a distribution with incorrect license metadata,
but not to publish one on PyPI or any other index server that enforces
License-Expression
validity.
For authors using the now-deprecated License
field or license classifiers,
packaging tools may warn them and inform them of the replacement,
License-Expression
.
Tools may also help with the conversion and suggest a license expression in many common cases:
- The appendix Appendix: Mapping License Classifiers to SPDX Identifiers provides tool authors with recommendation on how to suggest a license expression produced from legacy classifiers.
- Tools may be able to suggest how to update an existing
License
value in project source metadata and convert that to a license expression, as also specified in this PEP.
Reference Implementation
Tools will need to support parsing and validating license expressions in the
License-Expression
field if they decide to implement this part of the
specification.
It’s up to the tools whether they prefer to implement the validation on their
side (e.g. like hatch) or use one of the available
Python libraries (e.g. license-expression).
This PEP does not mandate using any specific library and leaves it to the
tools authors to choose the best implementation for their projects.
Rejected Ideas
Many alternative ideas were proposed and after a careful consideration, rejected. The exhaustive list including the rationale for rejecting can be found in a separate page.
Appendices
A list of auxiliary documents is provided:
Acknowledgments
- Alyssa Coghlan
- Kevin P. Fleming
- Pradyun Gedam
- Oleg Grenrus
- Dustin Ingram
- Chris Jerdonek
- Cyril Roelandt
- Luis Villa
- Seth M. Larson
- Ofek Lev
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Source: https://github.com/python/peps/blob/main/peps/pep-0639.rst
Last modified: 2024-08-29 18:30:14 GMT