PEP 770 – Improving measurability of Python packages with Software Bill-of-Materials
- Author:
- Seth Larson <seth at python.org>
- Sponsor:
- Brett Cannon <brett at python.org>
- PEP-Delegate:
- Brett Cannon <brett at python.org>
- Discussions-To:
- Discourse thread
- Status:
- Draft
- Type:
- Standards Track
- Topic:
- Packaging
- Created:
- 02-Jan-2025
- Post-History:
- 05-Nov-2024, 06-Jan-2025
Abstract
Almost all Python packages today are accurately measurable by software composition analysis (SCA) tools. For projects that are not accurately measurable, there is no existing mechanism to annotate a Python package with composition data to improve measurability.
Software Bill-of-Materials (SBOM) is a technology-and-ecosystem-agnostic method for describing software composition, provenance, heritage, and more. SBOMs are used as inputs for SCA tools, such as scanners for vulnerabilities and licenses, and have been gaining traction in global software regulations and frameworks.
This PEP proposes using SBOM documents included in Python packages as a means to improve automated software measurability for Python packages.
The changes will update the Core Metadata specification to version 2.5.
Motivation
Measurability and Phantom Dependencies
Python packages are particularly affected by the “phantom dependency” problem, where software components that aren’t written in Python are included in Python packages for many reasons, such as ease of installation and compatibility with standards:
- Python serves scientific, data, web, and machine-learning use-cases which use compiled or non-Python languages like Rust, C, C++, Fortran, JavaScript, and others.
- The Python wheel format is preferred by users due to the ease-of-installation. No code is executed during the installation step, only extracting the archive.
- The Python wheel format requires bundling shared compiled libraries without a method to encode metadata about these libraries.
- Packages related to Python packaging sometimes need to solve the “bootstrapping” problem, so include pure Python projects inside their source code.
These software components can’t be described using Python package metadata and thus are likely to be missed by software composition analysis (SCA) software which can mean vulnerable software components aren’t reported accurately.
For example, the Python package Pillow includes 16 shared object libraries in the wheel that were bundled by auditwheel as a part of the build. None of those shared object libraries are detected when using common SCA tools like Syft and Grype. If an SBOM document is included annotating all the included shared libraries then SCA tools can identify the included software reliably.
Regulations
SBOMs are required by recent software security regulations, like the Secure Software Development Framework (SSDF) and the Cyber Resilience Act (CRA). Due to their inclusion in these regulations, the demand for SBOM documents of open source projects is expected to be high. One goal is to minimize the demands on open source project maintainers by enabling open source users that need SBOMs to self-serve using existing tooling.
Another goal is to enable contributions from users who need SBOMs to annotate projects they depend on with SBOM information. Today there is no mechanism to propagate the results of those contributions for a Python package so there is no incentive for users to contribute this type of work.
Rationale
Using SBOM standards instead of Core Metadata fields
Attempting to add every field offered by SBOM standards into Python package Core Metadata would result in an explosion of new Core Metadata fields, including the need to keep up-to-date as SBOM standards continue to evolve to suit new needs in that space.
Instead, this proposal delegates SBOM-specific metadata to SBOM documents that are included in Python packages and adds a new Core Metadata field for discoverability of included SBOM documents.
This standard also doesn’t aim to replace Core Metadata with SBOMs, instead focusing on the SBOM information being supplemental to Core Metadata. Included SBOMs only contain information about dependencies included in the package archive or information about the top-level software in the package that can’t be encoded into Core Metadata but is relevant for the SBOM use-case (“software identifiers”, “purpose”, “support level”, etc).
Zero-or-more opaque SBOM documents
Rather than requiring at most one included SBOM document per Python package, this PEP proposes that one or more SBOM documents may be included in a Python package. This means that code attempting to annotate a Python package with SBOM data may do so without being concerned about corrupting data already contained within other SBOM documents.
Additionally, this PEP treats SBOM document data opaquely instead relying on final end-users of the SBOM data to process the contained SBOM data. This choice acknowledges that SBOM standards are an active area of development where there is not yet (and may never be) a single definitive SBOM standard and that SBOM standards can continue to evolve independent of Python packaging standards. Already tools that consume SBOM documents support a multitude of SBOM standards to handle this reality.
These decisions mean this PEP is capable of supporting any SBOM standard and does not favor one over the other, instead deferring the decision to producing projects and tools and consuming user tooling.
Specification
The changes necessary to implement this PEP include:
- Additions to Core Metadata, as defined in the Core Metadata specification.
- Additions to the author-provided project source metadata, as defined in the pyproject.toml specification.
- Additions to the source distribution (sdist), built distribution (wheel), and installed project specifications.
In addition to the above, an informational PEP will be created for tools consuming included SBOM documents and other Python package metadata to generate complete SBOM documents for Python packages.
Terminology
This section describes terminology used later in the document:
- root SBOM directory
- The directory under which SBOM files are stored in a
project source tree, distribution archive
or installed project.
Also, the root directory that their paths
recorded in the Sbom-File
Core Metadata field are relative to.
Defined to be the project root directory
for a project source tree or
source distribution;
and a subdirectory named
sboms
of the directory containing the built metadata— i.e., the.dist-info/sboms
directory— for a Built Distribution or installed project.
Core Metadata
Add Sbom-File
field
The Sbom-File
is a new optional Core Metadata field. Each instance contains a
string representation of the path to an SBOM document. The path is specified
relative to the root SBOM directory for all project types. It is a
multi-use field that MAY appear zero or more times and each instance lists the
path to one such file. Files specified under this field are SBOM documents
that are distributed with the package.
As specified by this PEP, its value is also that file’s path relative to the root SBOM directory in both installed projects and the standardized Distribution Package types.
If an Sbom-File
is listed in a
Source Distribution or
Built Distribution’s Core Metadata:
- That file MUST be included in the distribution archive at the specified path relative to the root SBOM directory.
- Installers MUST install the file with the project at that same relative path.
- Inside the root SBOM directory, packaging tools MUST reproduce the directory structure under which the source files are located relative to the project root.
- Path delimiters MUST be the forward slash character (
/
), and parent directory indicators (..
) MUST NOT be used.
For all newly-uploaded distribution archives that include one or more
Sbom-File
fields in their Core Metadata and declare a Metadata-Version
of 2.5
or higher, PyPI and other indices SHOULD validate that all files
specified with Sbom-File
are present in the distribution archives.
Project source metadata
This PEP specifies changes to the project’s source metadata under a
[project]
table in the pyproject.toml
file.
Add sbom-files
key
A new optional sbom-files
key is added to the [project]
table for
specifying paths in the project source tree relative to pyproject.toml
to
file(s) containing SBOMs to be distributed with the package. This key
corresponds to the Sbom-File
fields in the Core Metadata.
Its value is an array of strings which MUST contain valid glob patterns, as specified below:
- Alphanumeric characters, underscores (
_
), hyphens (-
) and dots (.
) MUST be matched verbatim. - Special glob characters:
*
,?
,**
and character ranges:[]
containing only the verbatim matched characters MUST be supported. Within[...]
, the hyphen indicates a locale-agnostic range (e.g. a-z, order based on Unicode code points). Hyphens at the start or end are matched literally. - Path delimiters MUST be the forward slash character (
/
). Patterns are relative to the directory containingpyproject.toml
, therefore the leading slash character MUST NOT be used. - Parent directory indicators (
..
) MUST NOT be used.
Any characters or character sequences not covered by this specification are invalid. Projects MUST NOT use such values. Tools consuming this field SHOULD reject invalid values with an error.
Literal paths (e.g. bom.cdx.json
) are treated as valid globs which means
they can also be defined.
Build tools:
- MUST treat each value as a glob pattern, and MUST raise an error if the pattern contains invalid glob syntax.
- MUST include all files matched by a listed pattern in all distribution archives.
- MUST list each matched file path under an
Sbom-File
field in the Core Metadata. - MUST raise an error if any individual user-specified pattern does not match at least one file.
If the sbom-files
key is present and is set to a value of an empty array,
then tools MUST NOT include any SBOM files and MUST NOT raise an error.
Examples of valid SBOM files declarations:
[project]
sbom-files = ["bom.json"]
[project]
sbom-files = ["sboms/openssl.cdx.json", "sboms/openssl.spdx.json"]
[project]
sbom-files = ["sboms/*"]
[project]
sbom-files = []
Examples of invalid SBOM files declarations:
[project]
sbom-files = ["..\bom.json"]
Reason: ..
must not be used. \\
is an invalid path delimiter, /
must be used.
[project]
sbom-files = ["bom{.json*"]
Reason: bom{.json
is not a valid glob.
SBOM files in project formats
A few additions will be made to the existing specifications.
- Project source trees
- Per Project source metadata section, the
Declaring Project Metadata specification
will be updated to reflect that SBOM file paths MUST be relative to the
project root directory; i.e. the directory containing the
pyproject.toml
(or equivalently, other legacy project configuration, e.g.setup.py
,setup.cfg
, etc). - Source distributions (sdists)
- The sdist specification will be updated to reflect that if the
Metadata-Version
is2.5
or greater, the sdist MUST contain any SBOM files specified by theSbom-File
field in thePKG-INFO
at their respective paths relative to the sdist (containing thepyproject.toml
and thePKG-INFO
Core Metadata). - Built distributions (wheels)
- The wheel specification will be updated to reflect that if the
Metadata-Version
is2.5
or greater and one or moreSbom-File
fields are specified, the.dist-info
directory MUST contain ansboms
subdirectory, which MUST contain the files listed in theSbom-File
fields in theMETADATA
file at their respective paths relative to thesboms
directory. - Installed projects
- The Recording Installed Projects specification will be updated to reflect that
if the
Metadata-Version
is2.5
or greater and one or moreSbom-File
fields is specified, the.dist-info
directory MUST contain ansboms
subdirectory which MUST contain the files listed in theSbom-File
fields in theMETADATA
file at their respective paths relative to thesboms
directory, and that any files in this directory MUST be copied from wheels by install tools.
SBOM data interoperability
This PEP treats data contained within SBOM documents as opaque, recognizing that SBOM standards are an active area of development. However, there are some considerations for SBOM data producers that when followed will improve the interoperability and usability of SBOM data made available in Python packages:
- SBOM documents SHOULD use a widely-accepted SBOM standard, such as CycloneDX or SPDX.
- SBOM documents SHOULD use UTF-8-encoded JSON (RFC 8259) when available for the SBOM standard in use.
- SBOM documents SHOULD include all required fields for the SBOM standard in use.
- SBOM documents SHOULD include a “time of creation” and “creating tool” field for the SBOM standard in use. This information is important for users attempting to reconstruct different stages for a Python package being built.
- The primary component described by the SBOM document SHOULD be the top-level software within the Python package (for example, “pkg:pypi/pillow” for the Pillow package).
- All non-primary components SHOULD have one or more paths in the relationship graph showing the relationship between components. If this information isn’t included, SCA tools might exclude components outside of the relationship graph.
- All software components SHOULD have a name, version, and one or more software identifiers (PURL, CPE, download URL).
PyPI and other indices MAY validate the contents of SBOM documents specified by this PEP, but MUST NOT validate or reject data for unknown SBOM standards, versions, or fields.
Backwards Compatibility
There are no backwards compatibility concerns for this PEP.
The changes to Python package Core Metadata and pyproject.toml
are
only additive, this PEP doesn’t change the behavior of any existing fields.
Tools which are processing Python packages can use the Sbom-File
core
metadata field to clearly delineate between packages which include SBOM
documents that implement this PEP (and thus have more requirements) and
packages which include SBOM documents before this PEP was authored.
Security Implications
SBOM documents are only as useful as the information encoded in them. If an SBOM document contains incorrect information then this can result in incorrect downstream analysis by SCA tools. For this reason, it’s important for tools including SBOM data into Python packages to be confident in the information they are recording. SBOMs are capable of recording “known unknowns” in addition to known data. This practice is recommended when not certain about the data being recorded to allow for further analysis by users.
Because SBOM documents can encode information about the original system where a Python package is built (for example, the operating system name and version, less commonly the names of paths). This information has the potential to “leak” through the Python package to installers via SBOMs. If this information is sensitive, then that could represent a security risk.
How to Teach This
Most typical users of Python and Python packages won’t need to know the details of this standard. The details of this standard are most important to either maintainers of Python packages and developers of SCA tools such as SBOM generation tools and vulnerability scanners.
What do Python package maintainers need to know?
Python package metadata can already describe the top-level software included in
a package archive, but what if a package archive contains other software
components beyond the top-level software? For example, the Python wheel for
“Pillow” contains a handful of other software libraries bundled inside, like
libjpeg
, libpng
, libwebp
, and so on. This scenario is where this PEP
is most useful, for adding metadata about bundled software to a Python package.
Some build tools may be able to automatically annotate bundled dependencies. Typically tools can automatically annotate bundled dependencies when those dependencies come from a “packaging ecosystem” (such as PyPI, Linux distros, Crates.io, NPM, etc).
For packages which cannot be automatically annotated and if the package author
wishes to provide an SBOM the approach will be to generate or author SBOM files
and then include those files using pyproject.toml
:
[project] ... sbom-files = [ "sboms/bom.cdx.json" ]
For projects manually specifying an SBOM document the challenge will be keeping the document up-to-date. The CPython project has some customized tooling for this task, but it can likely be generalized into a tool reusable by other projects.
What do users of SBOM documents need to know?
Many users of this PEP won’t know of its existence, instead their software composition analysis tools, SBOM tools, or vulnerability scanners will simply begin giving more comprehensive information after an upgrade. For users that are interested in the sources of this new information, the “tool” field of SBOM metadata already provides linkages to the projects generating their SBOMs.
For users who need SBOM documents describing their open source dependencies the first step should always be “create them yourself”. Using the benchmarks above a list of tools that are known to be accurate for Python packages can be documented and recommended to users. For projects which require additional manual SBOM annotation: tips for contributing this data and tools for maintaining the data can be recommended.
Reference Implementation
Auditwheel fork which generates CycloneDX SBOM documents to include in wheels describing bundled shared library files. These SBOM documents worked as expected for the Syft and Grype SBOM and vulnerability scanners.
Rejected Ideas
Why not require a single SBOM standard?
Most discussion and development around SBOMs today focuses on two SBOM standards: CycloneDX and SPDX. There is no clear “winner” between these two standards, both standards are frequently used by projects and software ecosystems.
Because both standards are frequently used, tools for consuming and processing SBOM documents commonly need to support both standards. This means that this PEP is not constrained to select a single SBOM standard by its consumers and thus can allow tools creating SBOM documents for inclusion in Python packages to choose which SBOM standard works best for their use-case.
Rejected Ideas
Selecting a single SBOM standard
There is no universally accepted SBOM standard and this area is still rapidly evolving (for example, SPDX released a new major version of their standard in April 2024). To avoid locking the Python ecosystem into a specific standard ahead of when a clear winner emerges this PEP treats SBOM documents as opaque and only makes recommendations to promote compatibility with downstream consumers of SBOM document data.
None of the decisions in this PEP restrict a future PEP to select a single SBOM standard. Tools that use SBOM data today already need to support multiple formats to handle this situation, so a future standard that updates to require only one standard would have no effect on downstream SBOM tools.
Open Issues
Conditional project source SBOM files
How can a project specify an SBOM file that is conditional? Under what circumstances would an SBOM document be conditional?
References
- Visualizing the Python package SBOM data flow. This is a graphic that shows how this PEP fits into the bigger picture of Python packaging’s SBOM data story.
- Adding SBOMs to Python wheels with auditwheel. This was some early results from a fork of auditwheel to add SBOM data to a wheel and then use an SBOM generation tool Syft to detect the SBOM in the installed package.
Acknowledgements
Thanks to Karolina Surma for authoring and leading PEP 639 to acceptance. This PEP copies the specification from PEP 639 for specifying files in project source metadata, Core Metadata, and project formats is based on.
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Source: https://github.com/python/peps/blob/main/peps/pep-0770.rst
Last modified: 2025-01-30 22:38:30 GMT