PEP 770 – Improving measurability of Python packages with Software Bill-of-Materials
- Author:
- Seth Larson <seth at python.org>
- Sponsor:
- Brett Cannon <brett at python.org>
- PEP-Delegate:
- Brett Cannon <brett at python.org>
- Discussions-To:
- Discourse thread
- Status:
- Draft
- Type:
- Standards Track
- Topic:
- Packaging
- Created:
- 02-Jan-2025
- Post-History:
- 05-Nov-2024, 06-Jan-2025
Abstract
Software Bill-of-Materials (SBOM) is a technology-and-ecosystem-agnostic method for describing software composition, provenance, heritage, and more. SBOMs are used as inputs for software composition analysis (SCA) tools, such as scanners for vulnerabilities and licenses, and have been gaining traction in global software regulations and frameworks.
This PEP proposes using SBOM documents included in Python packages as a means to improve software measurability for Python packages.
The changes will update the Core Metadata specification to version 2.5.
Motivation
Measurability and Phantom Dependencies
Python packages are particularly affected by the “phantom dependency” problem, where software components that aren’t written in Python are included in Python packages for many reasons, such as ease of installation and compatibility with standards:
- Python serves scientific, data, web, and machine-learning use-cases which use compiled or non-Python languages like Rust, C, C++, Fortran, JavaScript, and others.
- The Python wheel format is preferred by users due to the ease-of-installation. No code is executed during the installation step, only extracting the archive.
- The Python wheel format requires bundling shared compiled libraries without a method to encode metadata about these libraries.
- Packages related to Python packaging sometimes need to solve the “bootstrapping” problem, so include pure Python projects inside their source code.
These software components can’t be described using Python package metadata and thus are likely to be missed by software composition analysis (SCA) software which can mean vulnerable software components aren’t reported accurately.
For example, the Python package Pillow includes 16 shared object libraries in the wheel that were bundled by auditwheel as a part of the build. None of those shared object libraries are detected when using common SCA tools like Syft and Grype. If an SBOM document is included annotating all the included shared libraries then SCA tools can identify the included software reliably.
Regulations
SBOMs are required by recent software security regulations, like the Secure Software Development Framework (SSDF) and the Cyber Resilience Act (CRA). Due to their inclusion in these regulations, the demand for SBOM documents of open source projects is expected to be high. One goal is to minimize the demands on open source project maintainers by enabling open source users that need SBOMs to self-serve using existing tooling.
Another goal is to enable contributions from users who need SBOMs to annotate projects they depend on with SBOM information. Today there is no mechanism to propagate the results of those contributions for a Python package so there is no incentive for users to contribute this type of work.
Rationale
Using SBOM standards instead of Core Metadata fields
Attempting to add every field offered by SBOM standards into Python package Core Metadata would result in an explosion of new Core Metadata fields, including the need to keep up-to-date as SBOM standards continue to evolve to suit new needs in that space.
Instead, this proposal delegates SBOM-specific metadata to SBOM documents that are included in Python packages and adds a new Core Metadata field for discoverability of included SBOM documents.
This standard also doesn’t aim to replace Core Metadata with SBOMs, instead focusing on the SBOM information being supplemental to Core Metadata. Included SBOMs only contain information about dependencies included in the package archive or information about the top-level software in the package that can’t be encoded into Core Metadata but is relevant for the SBOM use-case (“software identifiers”, “purpose”, “support level”, etc).
Zero-or-more SBOM documents per package
Rather than requiring at most one included SBOM document per Python package, this PEP proposes that one or more SBOM documents may be included in a Python package. This means that code attempting to annotate a Python package with SBOM data may do so without being concerned about corrupting data already contained within other SBOM documents.
This decision also means this PEP is capable of supporting multiple SBOM standards without favoring one, instead deferring support to SBOM consuming tools. This decision also means that the merging of SBOM documents never needs to be handled by Python tools, instead deferring this to SBOM consuming tools who are better placed to handle cross-standard conversions.
Specification
The changes necessary to implement this PEP include:
- Additions to Core Metadata, as defined in the Core Metadata specification.
- Additions to the author-provided project source metadata, as defined in the pyproject.toml specification.
- Additions to the source distribution (sdist), built distribution (wheel), and installed project specifications.
In addition to the above, an informational PEP will be created for tools consuming included SBOM documents and other Python package metadata to generate complete SBOM documents for Python packages.
Core Metadata
Add Sbom-File
field
The Sbom-File
is an optional Core Metadata field. Each instance contains a
string representation of the path of an SBOM document. The path is located
within the project source tree, relative to the project root directory. It is a
multi-use field that MAY appear zero or more times and each instance lists the
path to one such file. Files specified under this field are SBOM documents
that are distributed with the package.
As specified by this PEP, its value is also that file’s path relative to the root SBOM directory in both installed projects and the standardized Distribution Package types.
If an Sbom-File
is listed in a
Source Distribution or
Built Distribution’s Core Metadata:
- That file MUST be included in the distribution archive at the specified path relative to the root license directory.
- That file MUST be installed with the project at that same relative path.
- Inside the root SBOM directory, packaging tools MUST reproduce the directory structure under which the source files are located relative to the project root. The root SBOM directory is specified in a later section.
- Path delimiters MUST be the forward slash character (
/
), and parent directory indicators (..
) MUST NOT be used. - SBOM document contents MUST be UTF-8 encoded JSON according to RFC 8259.
- SBOM document contents MUST use an SBOM standard, and for better interoperability SHOULD be a well-known SBOM standard such as CycloneDX or SPDX.
- The “primary” component being described in included SBOM documents MUST be the
Python package. This is achieved in CycloneDX using the
metadata.component
field and in SPDX using theDESCRIBES
relationship. - SBOM documents MUST include metadata for the timestamp when the SBOM document was created. This information helps consuming tools understand the order that multiple SBOM documents were created to untangle conflicts between various stages building the Python package.
- SBOM documents SHOULD include metadata describing the tool creating the SBOM document. This information helps users find which tool needs to be fixed in the case of defects.
For all newly-uploaded distribution archives that include one or more
Sbom-File
fields in their Core Metadata and declare a Metadata-Version
of 2.5
or higher, PyPI SHOULD validate that all specified files are present
in the distribution archives, are valid UTF-8 encoded JSON, and for well-known
SBOM standards provide the minimum required fields by those standards and this
PEP.
Project source metadata
This PEP specifies changes to the project’s source metadata under a
[project]
table in the pyproject.toml
file.
Add sbom-files
key
A new sbom-files
key is added to the [project]
table for specifying
paths in the project source tree relative to pyproject.toml
to file(s)
containing SBOMs to be distributed with the package. This key corresponds to the
Sbom-File
fields in the Core Metadata.
Its value is an array of strings which MUST contain valid glob patterns, as specified below:
- Alphanumeric characters, underscores (
_
), hyphens (-
) and dots (.
) MUST be matched verbatim. - Special glob characters:
*
,?
,**
and character ranges:[]
containing only the verbatim matched characters MUST be supported. Within[...]
, the hyphen indicates a locale-agnostic range (e.g. a-z, order based on Unicode code points). Hyphens at the start or end are matched literally. - Path delimiters MUST be the forward slash character (
/
). Patterns are relative to the directory containingpyproject.toml
, therefore the leading slash character MUST NOT be used. - Parent directory indicators (
..
) MUST NOT be used.
Any characters or character sequences not covered by this specification are invalid. Projects MUST NOT use such values. Tools consuming this field SHOULD reject invalid values with an error.
Tools MUST assume that SBOM file content is valid UTF-8 encoded JSON, and SHOULD validate this an raise an error for invalid formats and encodings.
Literal paths (e.g. bom.cdx.json
) are treated as valid globs which means
they can also be defined.
Build tools:
- MUST treat each value as a glob pattern, and MUST raise an error if the pattern contains invalid glob syntax.
- MUST include all files matched by a listed pattern in all distribution archives.
- MUST list each matched file path under an
Sbom-File
field in the Core Metadata. - MUST raise an error if any individual user-specified pattern does not match at least one file.
If the sbom-files
key is present and is set to a value of an empty array,
then tools MUST NOT include any SBOM files and MUST NOT raise an error.
Examples of valid SBOM files declarations:
[project]
sbom-files = ["bom.json"]
[project]
sbom-files = ["sboms/openssl.cdx.json", "licenses/openssl.spdx.json"]
[project]
sbom-files = ["sboms/*"]
[project]
sbom-files = []
Examples of invalid SBOM files declarations:
[project]
sbom-files = ["..\bom.json"]
Reason: ..
must not be used. \\
is an invalid path delimited, /
must be used.
[project]
sbom-files = ["bom{.json*"]
Reason: bom{.json
is not a valid glob.
SBOM files in project formats
A few additions will be made to the existing specifications.
- Project source trees
- Per Project source metadata section, the
Declaring Project Metadata specification
will be updated to reflect that SBOM file paths MUST be relative to the
project root directory; i.e. the directory containing the
pyproject.toml
(or equivalently, other legacy project configuration, e.g.setup.py
,setup.cfg
, etc). - Source distributions (sdists)
- The sdist specification will be updated to reflect that if the
Metadata-Version
is2.5
or greater, the sdist MUST contain any SBOM files specified by theSbom-File
field in thePKG-INFO
at their respective paths relative to the sdist (containing thepyproject.toml
and thePKG-INFO
Core Metadata). - Built distributions (wheels)
- The wheel specification will be updated to reflect that if the
Metadata-Version
is2.5
or greater and one or moreSbom-File
fields are specified, the.dist-info
directory MUST contain ansboms
subdirectory, which MUST contain the files listed in theSbom-File
fields in theMETADATA
file at their respective paths relative to thesboms
directory. - Installed projects
- The Recording Installed Projects specification will be updated to reflect that
if the
Metadata-Version
is2.5
or greater and one or moreSbom-File
fields is specified, the.dist-info
directory MUST contain ansboms
subdirectory which MUST contain the files listed in theSbom-File
fields in theMETADATA
file at their respective paths relative to thesboms
directory, and that any files in this directory MUST be copied from wheels by install tools.
Backwards Compatibility
There are no backwards compatibility concerns for this PEP.
The changes to Python package Core Metadata and pyproject.toml
are
only additive, this PEP doesn’t change the behavior of any existing fields.
Tools which are processing Python packages can use the Sbom-File
core
metadata field to clearly delineate between packages which include SBOM
documents that implement this PEP (and thus have more requirements) and
packages which include SBOM documents before this PEP was authored.
Security Implications
SBOM documents are only as useful as the information encoded in them. If an SBOM document contains incorrect information then this can result in incorrect downstream analysis by SCA tools. For this reason, it’s important for tools including SBOM data into Python packages to be confident in the information they are recording. SBOMs are capable of recording “known unknowns” in addition to known data. This practice is recommended when not certain about the data being recorded to allow for further analysis by users.
Because SBOM documents can encode information about the original system where a Python package is built (for example, the operating system name and version, less commonly the names of paths). This information has the potential to “leak” through the Python package to installers via SBOMs. If this information is sensitive, then that could represent a security risk.
How to Teach This
Most typical users of Python and Python packages won’t need to know the details of this standard. The details of this standard are most important to either maintainers of Python packages and developers of SCA tools such as SBOM generation tools and vulnerability scanners.
Most Python packages don’t contain code from other software components and thus are already measurable by SCA tools without the need of this standard or additional SBOM documents. Pure-Python packages are about ~90% of popular packages on PyPI.
For projects that do contain other software components, documentation will be added to the Python Packaging User Guide for how to specify and maintain SBOM documents for Python packages in source code.
A follow-up informational PEP will be authored to describe how to transform Python packaging metadata, including the mechanism described in this PEP, into an SBOM document describing Python packages.
Reference Implementation
Auditwheel fork which generates CycloneDX SBOM documents to include in wheels describing bundled shared library files. These SBOM documents worked as expected for the Syft and Grype SBOM and vulnerability scanners.
Rejected Ideas
Why not require a single SBOM standard?
Most discussion and development around SBOMs today focuses on two SBOM standards: CycloneDX and SPDX. There is no clear “winner” between these two standards, both standards are frequently used by projects and software ecosystems.
Because both standards are frequently used, tools for consuming and processing SBOM documents commonly need to support both standards. This means that this PEP is not constrained to select a single SBOM standard by its consumers and thus can allow tools creating SBOM documents for inclusion in Python packages to choose which SBOM standard works best for their use-case.
Open Issues
Conditional project source SBOM files
How can a project specify an SBOM file that is conditional? Under what circumstances would an SBOM document be conditional?
References
- Visualizing the Python package SBOM data flow. This is a graphic that shows how this PEP fits into the bigger picture of Python packaging’s SBOM data story.
- Adding SBOMs to Python wheels with auditwheel. This was some early results from a fork of auditwheel to add SBOM data to a wheel and then use an SBOM generation tool Syft to detect the SBOM in the installed package.
Acknowledgements
Thanks to Karolina Surma for authoring and leading PEP 639 to acceptance. This PEP copies the specification from PEP 639 for specifying files in project source metadata, Core Metadata, and project formats is based on.
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Source: https://github.com/python/peps/blob/main/peps/pep-0770.rst
Last modified: 2025-01-06 22:43:56 GMT