Following system colour scheme Selected dark colour scheme Selected light colour scheme

Python Enhancement Proposals

PEP 770 – Improving measurability of Python packages with Software Bill-of-Materials

Author:
Seth Larson <seth at python.org>
Sponsor:
Brett Cannon <brett at python.org>
PEP-Delegate:
Brett Cannon <brett at python.org>
Discussions-To:
Discourse thread
Status:
Draft
Type:
Standards Track
Topic:
Packaging
Created:
02-Jan-2025
Post-History:
05-Nov-2024, 06-Jan-2025

Table of Contents

Abstract

Software Bill-of-Materials (SBOM) is a technology-and-ecosystem-agnostic method for describing software composition, provenance, heritage, and more. SBOMs are used as inputs for software composition analysis (SCA) tools, such as scanners for vulnerabilities and licenses, and have been gaining traction in global software regulations and frameworks.

This PEP proposes using SBOM documents included in Python packages as a means to improve software measurability for Python packages.

The changes will update the Core Metadata specification to version 2.5.

Motivation

Measurability and Phantom Dependencies

Python packages are particularly affected by the “phantom dependency” problem, where software components that aren’t written in Python are included in Python packages for many reasons, such as ease of installation and compatibility with standards:

  • Python serves scientific, data, web, and machine-learning use-cases which use compiled or non-Python languages like Rust, C, C++, Fortran, JavaScript, and others.
  • The Python wheel format is preferred by users due to the ease-of-installation. No code is executed during the installation step, only extracting the archive.
  • The Python wheel format requires bundling shared compiled libraries without a method to encode metadata about these libraries.
  • Packages related to Python packaging sometimes need to solve the “bootstrapping” problem, so include pure Python projects inside their source code.

These software components can’t be described using Python package metadata and thus are likely to be missed by software composition analysis (SCA) software which can mean vulnerable software components aren’t reported accurately.

For example, the Python package Pillow includes 16 shared object libraries in the wheel that were bundled by auditwheel as a part of the build. None of those shared object libraries are detected when using common SCA tools like Syft and Grype. If an SBOM document is included annotating all the included shared libraries then SCA tools can identify the included software reliably.

Regulations

SBOMs are required by recent software security regulations, like the Secure Software Development Framework (SSDF) and the Cyber Resilience Act (CRA). Due to their inclusion in these regulations, the demand for SBOM documents of open source projects is expected to be high. One goal is to minimize the demands on open source project maintainers by enabling open source users that need SBOMs to self-serve using existing tooling.

Another goal is to enable contributions from users who need SBOMs to annotate projects they depend on with SBOM information. Today there is no mechanism to propagate the results of those contributions for a Python package so there is no incentive for users to contribute this type of work.

Rationale

Using SBOM standards instead of Core Metadata fields

Attempting to add every field offered by SBOM standards into Python package Core Metadata would result in an explosion of new Core Metadata fields, including the need to keep up-to-date as SBOM standards continue to evolve to suit new needs in that space.

Instead, this proposal delegates SBOM-specific metadata to SBOM documents that are included in Python packages and adds a new Core Metadata field for discoverability of included SBOM documents.

This standard also doesn’t aim to replace Core Metadata with SBOMs, instead focusing on the SBOM information being supplemental to Core Metadata. Included SBOMs only contain information about dependencies included in the package archive or information about the top-level software in the package that can’t be encoded into Core Metadata but is relevant for the SBOM use-case (“software identifiers”, “purpose”, “support level”, etc).

Zero-or-more SBOM documents per package

Rather than requiring at most one included SBOM document per Python package, this PEP proposes that one or more SBOM documents may be included in a Python package. This means that code attempting to annotate a Python package with SBOM data may do so without being concerned about corrupting data already contained within other SBOM documents.

This decision also means this PEP is capable of supporting multiple SBOM standards without favoring one, instead deferring support to SBOM consuming tools. This decision also means that the merging of SBOM documents never needs to be handled by Python tools, instead deferring this to SBOM consuming tools who are better placed to handle cross-standard conversions.

Specification

The changes necessary to implement this PEP include:

In addition to the above, an informational PEP will be created for tools consuming included SBOM documents and other Python package metadata to generate complete SBOM documents for Python packages.

Core Metadata

Add Sbom-File field

The Sbom-File is an optional Core Metadata field. Each instance contains a string representation of the path of an SBOM document. The path is located within the project source tree, relative to the project root directory. It is a multi-use field that MAY appear zero or more times and each instance lists the path to one such file. Files specified under this field are SBOM documents that are distributed with the package.

As specified by this PEP, its value is also that file’s path relative to the root SBOM directory in both installed projects and the standardized Distribution Package types.

If an Sbom-File is listed in a Source Distribution or Built Distribution’s Core Metadata:

  • That file MUST be included in the distribution archive at the specified path relative to the root license directory.
  • That file MUST be installed with the project at that same relative path.
  • Inside the root SBOM directory, packaging tools MUST reproduce the directory structure under which the source files are located relative to the project root. The root SBOM directory is specified in a later section.
  • Path delimiters MUST be the forward slash character (/), and parent directory indicators (..) MUST NOT be used.
  • SBOM document contents MUST be UTF-8 encoded JSON according to RFC 8259.
  • SBOM document contents MUST use an SBOM standard, and for better interoperability SHOULD be a well-known SBOM standard such as CycloneDX or SPDX.
  • The “primary” component being described in included SBOM documents MUST be the Python package. This is achieved in CycloneDX using the metadata.component field and in SPDX using the DESCRIBES relationship.
  • SBOM documents MUST include metadata for the timestamp when the SBOM document was created. This information helps consuming tools understand the order that multiple SBOM documents were created to untangle conflicts between various stages building the Python package.
  • SBOM documents SHOULD include metadata describing the tool creating the SBOM document. This information helps users find which tool needs to be fixed in the case of defects.

For all newly-uploaded distribution archives that include one or more Sbom-File fields in their Core Metadata and declare a Metadata-Version of 2.5 or higher, PyPI SHOULD validate that all specified files are present in the distribution archives, are valid UTF-8 encoded JSON, and for well-known SBOM standards provide the minimum required fields by those standards and this PEP.

Project source metadata

This PEP specifies changes to the project’s source metadata under a [project] table in the pyproject.toml file.

Add sbom-files key

A new sbom-files key is added to the [project] table for specifying paths in the project source tree relative to pyproject.toml to file(s) containing SBOMs to be distributed with the package. This key corresponds to the Sbom-File fields in the Core Metadata.

Its value is an array of strings which MUST contain valid glob patterns, as specified below:

  • Alphanumeric characters, underscores (_), hyphens (-) and dots (.) MUST be matched verbatim.
  • Special glob characters: *, ?, ** and character ranges: [] containing only the verbatim matched characters MUST be supported. Within [...], the hyphen indicates a locale-agnostic range (e.g. a-z, order based on Unicode code points). Hyphens at the start or end are matched literally.
  • Path delimiters MUST be the forward slash character (/). Patterns are relative to the directory containing pyproject.toml, therefore the leading slash character MUST NOT be used.
  • Parent directory indicators (..) MUST NOT be used.

Any characters or character sequences not covered by this specification are invalid. Projects MUST NOT use such values. Tools consuming this field SHOULD reject invalid values with an error.

Tools MUST assume that SBOM file content is valid UTF-8 encoded JSON, and SHOULD validate this an raise an error for invalid formats and encodings.

Literal paths (e.g. bom.cdx.json) are treated as valid globs which means they can also be defined.

Build tools:

  • MUST treat each value as a glob pattern, and MUST raise an error if the pattern contains invalid glob syntax.
  • MUST include all files matched by a listed pattern in all distribution archives.
  • MUST list each matched file path under an Sbom-File field in the Core Metadata.
  • MUST raise an error if any individual user-specified pattern does not match at least one file.

If the sbom-files key is present and is set to a value of an empty array, then tools MUST NOT include any SBOM files and MUST NOT raise an error.

Examples of valid SBOM files declarations:

[project]
sbom-files = ["bom.json"]

[project]
sbom-files = ["sboms/openssl.cdx.json", "licenses/openssl.spdx.json"]

[project]
sbom-files = ["sboms/*"]

[project]
sbom-files = []

Examples of invalid SBOM files declarations:

[project]
sbom-files = ["..\bom.json"]

Reason: .. must not be used. \\ is an invalid path delimited, / must be used.

[project]
sbom-files = ["bom{.json*"]

Reason: bom{.json is not a valid glob.

SBOM files in project formats

A few additions will be made to the existing specifications.

Project source trees
Per Project source metadata section, the Declaring Project Metadata specification will be updated to reflect that SBOM file paths MUST be relative to the project root directory; i.e. the directory containing the pyproject.toml (or equivalently, other legacy project configuration, e.g. setup.py, setup.cfg, etc).
Source distributions (sdists)
The sdist specification will be updated to reflect that if the Metadata-Version is 2.5 or greater, the sdist MUST contain any SBOM files specified by the Sbom-File field in the PKG-INFO at their respective paths relative to the sdist (containing the pyproject.toml and the PKG-INFO Core Metadata).
Built distributions (wheels)
The wheel specification will be updated to reflect that if the Metadata-Version is 2.5 or greater and one or more Sbom-File fields are specified, the .dist-info directory MUST contain an sboms subdirectory, which MUST contain the files listed in the Sbom-File fields in the METADATA file at their respective paths relative to the sboms directory.
Installed projects
The Recording Installed Projects specification will be updated to reflect that if the Metadata-Version is 2.5 or greater and one or more Sbom-File fields is specified, the .dist-info directory MUST contain an sboms subdirectory which MUST contain the files listed in the Sbom-File fields in the METADATA file at their respective paths relative to the sboms directory, and that any files in this directory MUST be copied from wheels by install tools.

Backwards Compatibility

There are no backwards compatibility concerns for this PEP.

The changes to Python package Core Metadata and pyproject.toml are only additive, this PEP doesn’t change the behavior of any existing fields.

Tools which are processing Python packages can use the Sbom-File core metadata field to clearly delineate between packages which include SBOM documents that implement this PEP (and thus have more requirements) and packages which include SBOM documents before this PEP was authored.

Security Implications

SBOM documents are only as useful as the information encoded in them. If an SBOM document contains incorrect information then this can result in incorrect downstream analysis by SCA tools. For this reason, it’s important for tools including SBOM data into Python packages to be confident in the information they are recording. SBOMs are capable of recording “known unknowns” in addition to known data. This practice is recommended when not certain about the data being recorded to allow for further analysis by users.

Because SBOM documents can encode information about the original system where a Python package is built (for example, the operating system name and version, less commonly the names of paths). This information has the potential to “leak” through the Python package to installers via SBOMs. If this information is sensitive, then that could represent a security risk.

How to Teach This

Most typical users of Python and Python packages won’t need to know the details of this standard. The details of this standard are most important to either maintainers of Python packages and developers of SCA tools such as SBOM generation tools and vulnerability scanners.

Most Python packages don’t contain code from other software components and thus are already measurable by SCA tools without the need of this standard or additional SBOM documents. Pure-Python packages are about ~90% of popular packages on PyPI.

For projects that do contain other software components, documentation will be added to the Python Packaging User Guide for how to specify and maintain SBOM documents for Python packages in source code.

A follow-up informational PEP will be authored to describe how to transform Python packaging metadata, including the mechanism described in this PEP, into an SBOM document describing Python packages.

Reference Implementation

Auditwheel fork which generates CycloneDX SBOM documents to include in wheels describing bundled shared library files. These SBOM documents worked as expected for the Syft and Grype SBOM and vulnerability scanners.

Rejected Ideas

Why not require a single SBOM standard?

Most discussion and development around SBOMs today focuses on two SBOM standards: CycloneDX and SPDX. There is no clear “winner” between these two standards, both standards are frequently used by projects and software ecosystems.

Because both standards are frequently used, tools for consuming and processing SBOM documents commonly need to support both standards. This means that this PEP is not constrained to select a single SBOM standard by its consumers and thus can allow tools creating SBOM documents for inclusion in Python packages to choose which SBOM standard works best for their use-case.

Open Issues

Conditional project source SBOM files

How can a project specify an SBOM file that is conditional? Under what circumstances would an SBOM document be conditional?

References

Acknowledgements

Thanks to Karolina Surma for authoring and leading PEP 639 to acceptance. This PEP copies the specification from PEP 639 for specifying files in project source metadata, Core Metadata, and project formats is based on.


Source: https://github.com/python/peps/blob/main/peps/pep-0770.rst

Last modified: 2025-01-06 22:43:56 GMT