Following system colour scheme Selected dark colour scheme Selected light colour scheme

Python Enhancement Proposals

PEP 751 – A file format to record Python dependencies for installation reproducibility

Author:
Brett Cannon <brett at python.org>
Status:
Draft
Type:
Standards Track
Topic:
Packaging
Created:
24-Jul-2024
Post-History:
25-Jul-2024 30-Oct-2024
Replaces:
665

Table of Contents

Abstract

This PEP proposes a new file format for dependency specification to enable reproducible installation in a Python environment. The format is designed to be human-readable and machine-generated. Installers consuming the file should be able to calculate what to install without the need for dependency resolution at install-time.

Motivation

Currently, no standard exists to create an immutable record, such as a lock file, which specifies what direct and indirect dependencies should be installed into a virtual environment.

Considering there are at least five well-known solutions to this problem in the community (pip freeze, pip-tools, uv, Poetry, and PDM), there seems to be an appetite for lock files in general.

Those tools also vary in what locking scenarios they support. For instance, pip freeze and pip-tools only generate lock files for the current environment while PDM and Poetry try to lock for any environment to some degree. There’s also concerns around the lack of secure defaults in the face of supply chain attacks (e.g., always including hashes for files).

The lack of a standard also has some drawbacks. For instance, any tooling that wants to work with lock files must choose which format to support, potentially leaving users unsupported (e.g., Dependabot only supporting select tools, same for cloud providers who can do dependency installations on your behalf, etc.). It also impacts portability between tools, which causes vendor lock-in. By not having compatibility and interoperability it fractures tooling around lock files where both users and tools have to choose what lock file format to use upfront and making it costly to use/switch to other formats. Rallying around a single format removes that cost/barrier.

Note

Much of the motivation from PEP 665 also applies to this PEP.

Rationale

The format is designed so that a locker which produces the lock file and an installer which consumes the lock file can be separate tools. This allows for situations such as cloud hosting providers to use their own installer that’s optimized for their system which is independent of what locker the user used to create their lock file.

The file format is designed to be human-readable. This is so that the contents of the file can be audited by a human to make sure no undesired dependencies end up being included in the lock file.

The file format is also designed to not require a resolver at install time. This greatly simplifies installers and thus reasoning about what would be installed when consuming a lock file. It should also lead to faster installs which are much more frequent than creating a lock file.

Finally, the lock file is meant to be flexible enough to meets the various needs tools have for choosing what to install. That means the lock file records the dependency graph of what _may_ be installed. This allows tools to enter the graph at any point and still have reproducible results from that root of the graph. Flexibility also means supporting different installation scenarios within the same lock file (e.g., with or without test dependencies).

Specification

File Name

A lock file MUST be named pylock.toml. The use of the .toml file extension is to make syntax highlighting in editors easier and to reinforce the fact that the file format is meant to be human-readable.

The lock file SHOULD be located in the directory as appropriate for the scope of the lock file. Locking against a single pyproject.toml, for instance, would place the pylock.toml in the same directory. If the lock file covered multiple projects in a monorepo, then the expectation is the pylock.toml file would be in the directory that held all the projects being locked.

File Format

The format of the file is TOML.

All keys listed below are required unless otherwise noted. If two keys are mutually exclusive to one another, then one of the keys is required while the other is disallowed.

Keys in tables – including the top-level table – SHOULD be emitted by lockers in the order they are listed in this PEP when applicable unless another sort order is specified to minimize noise in diffs. If the keys are not explicitly specified in this PEP, then the keys SHOULD be sorted by lexicographic order.

As well, lockers SHOULD sort arrays in lexicographic order unless otherwise specified for the same reason.

version

  • String
  • The version of the lock file format.
  • This PEP specifies the initial version – and only valid value until future updates to the standard change it – as "1.0".
  • If an installer supports the major version but not the minor version, a tool SHOULD warn when an unknown key is seen.
  • If an installer doesn’t support a major version, it MUST raise an error.

hash-algorithm

  • String
  • The name of the hash algorithm used for calculating all hash values.
  • Only a single hash algorithm is used for the entire file to allow hash values to be written in inline tables for readability and compactness purposes by only listing a single hash value instead of multiple values based on multiple hash algorithms.
  • Specifying a single hash algorithm guarantees that an algorithm that the user prefers is used consistently throughout the file without having to audit each file hash value separately.
  • Allows for updating the entire file to a new hash algorithm without running the risk of accidentally leaving an old hash value in the file.
  • JSON-based Simple API for Python Package Indexes and the hashes dictionary of of the files dictionary of the Project Details dictionary specifies what values are valid and guidelines on what hash algorithms to use.
  • Failure to validate any hash values for any file that is to be installed MUST raise an error.

[locker]

  • Table
  • Record of the tool that generated the lock file.
  • Enough details SHOULD be provided such that the lock file from the details in this table can be reproduced (provided the same I/O data is available, e.g., Dependabot if only files from a repository is necessary to run the command).
locker.name
  • String
  • The name of the tool used to create the lock file.
  • If the locker is a Python project, its normalized name SHOULD be used.
locker.version
  • String
  • The version of the tool used.
locker.run
  • Optional
  • Inline table
  • Records the command used to create the lock file.
locker.run.module
  • Optional
  • String
  • The module name used for running the locker (i.e. what would be passed to python -m).
  • Lockers MUST specify this key if the locker can be executed via python -m.
locker.run.args
  • Optional
  • Array of strings
  • If the locker has a CLI, the arguments to pass to the locker.
  • All paths MUST be relative to the lock file so that another tool could use the lock file’s location as the current working directory.

[[groups]]

  • Array of tables
  • A named subset of packages as found in [[packages]].
  • Act as roots into the dependency graph.
  • Installers MUST allow the user to select one or more groups by name to install all relevant packages together.
  • Installers SHOULD let the user skip specifying a name if there is only one entry in the array.
groups.name
  • String
  • The name of the group.
groups.project
  • Mutually-exclusive with requirements
  • String
  • The normalized name of a package to act as the starting point into the dependency graph.
  • Analogous to locking to the [project] table in pyproject.toml.
  • Installers MUST let a user specify any optional features/extras that the package provides.
  • Lockers MUST NOT allow for ambiguity by specifying multiple package versions of the same package under the same group name when a package is listed in any project key.
groups.requirements
  • Mutually-exclusive with project
  • Array of tables
  • Represents the installation requirements for this group.
  • Analogous to a key in [dependency-groups] in pyproject.toml.
  • Lockers MUST make sure that resolving any requirement for any environment does not lead to ambiguity by having multiple values in [[packages]] match the same requirement.
  • Values in the array SHOULD be written as inline tables, sorted lexicographically by name, then by feature with the lack of that key sorting first.
groups.requirements.name
  • String
  • Normalized name of the package.
groups.requirements.extras
  • Optional
  • Array of strings
  • The names of the extras specified for the requirement (i.e. what comes between [...]).
groups.requirements.version
groups.requirements.marker

[[packages]]

  • Array of tables
  • The array contains all data on the nodes of the dependency graph.
  • Lockers SHOULD record packages in order by name lexicographically, version by its Python version specifiers ordering, and then by groups following Python’s sort order for lists of strings (i.e. item by item, then by length as a tiebreaker).
packages.name
packages.version
  • String
  • The version of the package.
packages.groups
  • Array of strings
  • Associates this table with the group.name entries of the same names.
packages.index-url
packages.direct
  • Optional (defaults to false)
  • Boolean
  • Represents whether the installation is via a direct URL reference.
packages.requires-python
  • String
  • Holds the version specifiers for Python version compatibility for the package and version.
  • The value MUST match what’s provided by the package version, if available, via Requires-Python.
[[packages.dependencies]]
  • Array of tables
  • A record of the dependency requirements of the package and version.
  • The values MUST semantically match what’s provided by the package version via Requires-Dist (multiple use) for all dependencies referenced in the lock file (i.e all base dependencies plus all dependencies for extras referenced in the lock file); lock files MAY list all dependencies for unused extras if desired.
  • Values in the array SHOULD be written as inline tables, sorted lexicographically by name, then by feature with the lack of that key sorting first.
packages.dependencies.name

See groups.requirements.name.

packages.dependencies.extras

See groups.requirements.extras.

packages.dependencies.version

See groups.requirements.version.

packages.dependencies.marker

See groups.requirements.marker.

packages.dependencies.feature
packages.editable
  • Optional (defaults to false)
  • Boolean
  • Specifies whether the package should be installed in editable mode.
[packages.source-tree]
  • Optional
  • Table
  • For recording where to find the source tree for the package version.
  • Lockers SHOULD write this table inline.
  • Support for source trees by installers is optional.
  • If support is provided by an installer it SHOULD be opt-in.
  • If multiple source trees are provided, installers MUST prefer either the vcs option or a file for security/reproducibility due to their commit or hash, respectively.
packages.source-tree.vcs
  • Optional
  • String
  • If specifying a VCS, the type of version control system used.
  • The valid values are specified by the registered VCSs of the direct URL data structure.
packages.source-tree.path
  • Required if url is not set
  • String
  • A path to the source tree, which may be absolute or relative.
  • If the path is relative it MUST be relative to the lock file.
  • The path may either be to a directory, file archive, or VCS checkout if vcs if is specified.
packages.source-tree.url
  • Required if path is not set
  • String
  • A URL to a file archive containing the source tree, or a VCS checkout if vcs is specified.
packages.source-tree.commit
  • Required if vcs is set
  • String
  • The commit ID for the repository which represents the package and version.
  • The value MUST be immutable for the VCS for security purposes (e.g. no Git tags).
packages.source-tree.size
  • Optional
  • Integer
  • The size in bytes for the source tree if it is a file.
  • Installers MUST verify the file size matches this value.
packages.source-tree.hash
  • Required if url or path points to a file
  • String
  • The hash value of the file contents using the hash algorithm specified by hash-algorithm.
  • Installers MUST verify the hash matches the file.
[packages.sdist]
  • Optional
  • Table
  • The location of a source distribution as specified by Source distribution format.
  • Lockers SHOULD write the table inline.
  • Support for source distributions by installers is optional.
  • If support is provided by an installer it SHOULD be opt-in.
packages.sdist.url
  • Optional; mutually-exclusive with path
  • String
  • The URL to the file.
packages.sdist.path
  • Optional; mutually-exclusive with url
  • String
  • A path to the file, which may be absolute or relative.
  • If the path is relative it MUST be relative to the lock file.
packages.sdist.upload-time
  • Optional and only applicable when url is specified
  • Offset date time
  • The upload date and time of the file as specified by a valid ISO 8601 date/time string for the .files[]."upload-time" field in the JSON version of Simple repository API.
packages.sdist.size
  • Optional
  • Integer
  • The size of the file in bytes.
  • Installers MUST verify the file size matches this value.
packages.sdist.hash
  • String
  • The hash value of the file contents using the hash algorithm specified by hash-algorithm.
  • Installers MUST verify the hash matches the file.
[[packages.wheels]]
  • Optional
  • Array of tables
  • For recording the wheel files as specified by Binary distribution format for the package version.
  • Lockers SHOULD write the table inline.
  • Lockers SHOULD sort the array values lexicographically by tag.
packages.wheels.tags
  • Array of string
  • The uncompressed tag portion of the wheel file: Python, ABI, and platform.
  • Lockers MUST make sure the tag values are unique within the packages.wheels array.
packages.wheels.build
  • Optional
  • String
  • The build tag for the wheel file (if appropriate).
packages.wheels.url

See packages.sdist.url.

packages.wheels.path

See packages.sdist.path.

packages.wheels.upload-time

See packages.sdist.upload-time.

packages.wheels.size

See packages.sdist.size.

packages.wheels.hash

See packages.sdist.hash.

[packages.tool]
  • Optional
  • Table
  • Similar usage as that of the [tool] table from the pyproject.toml specification , but at the package version level instead of at the lock file level (which is also available via [tool]).
  • Useful for scoping package version/release details (e.g., recording signing identities to then use to verify package integrity separately from where the package is hosted, prototyping future extensions to this file format, etc.).

[tool]

Examples

version = '1.0'
hash-algorithm = 'sha256'

[locker]
name = 'mousebender'
version = 'pep'
run = { module = 'mousebender', args = ['lock', '--platform', 'cpython3.12-manylinux2014-x64', '--platform', 'cpython3.12-windows-x64', 'cattrs', 'numpy'] }

[[groups]]
name = 'Default'
requirements = [
  { name = 'cattrs' },
  { name = 'numpy' },
]

[[packages]]
name = 'attrs'
version = '24.2.0'
groups = ['Default']
index_url = 'https://pypi.org/simple/attrs'
direct = false
requires_python = '>=3.7'
dependencies = [
  { name = 'importlib-metadata', marker = 'python_version < "3.8"' },
  { name = 'cloudpickle', marker = 'platform_python_implementation == "CPython"', feature = 'benchmark' },
  { name = 'hypothesis', feature = 'benchmark' },
  { name = 'mypy', version = '>=1.11.1', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9"', feature = 'benchmark' },
  { name = 'pympler', feature = 'benchmark' },
  { name = 'pytest-codspeed', feature = 'benchmark' },
  { name = 'pytest-mypy-plugins', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9" and python_version < "3.13"', feature = 'benchmark' },
  { name = 'pytest-xdist', extras = ['psutil'], feature = 'benchmark' },
  { name = 'pytest', version = '>=4.3.0', feature = 'benchmark' },
  { name = 'cloudpickle', marker = 'platform_python_implementation == "CPython"', feature = 'cov' },
  { name = 'coverage', extras = ['toml'], version = '>=5.3', feature = 'cov' },
  { name = 'hypothesis', feature = 'cov' },
  { name = 'mypy', version = '>=1.11.1', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9"', feature = 'cov' },
  { name = 'pympler', feature = 'cov' },
  { name = 'pytest-mypy-plugins', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9" and python_version < "3.13"', feature = 'cov' },
  { name = 'pytest-xdist', extras = ['psutil'], feature = 'cov' },
  { name = 'pytest', version = '>=4.3.0', feature = 'cov' },
  { name = 'cloudpickle', marker = 'platform_python_implementation == "CPython"', feature = 'dev' },
  { name = 'hypothesis', feature = 'dev' },
  { name = 'mypy', version = '>=1.11.1', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9"', feature = 'dev' },
  { name = 'pre-commit', feature = 'dev' },
  { name = 'pympler', feature = 'dev' },
  { name = 'pytest-mypy-plugins', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9" and python_version < "3.13"', feature = 'dev' },
  { name = 'pytest-xdist', extras = ['psutil'], feature = 'dev' },
  { name = 'pytest', version = '>=4.3.0', feature = 'dev' },
  { name = 'cogapp', feature = 'docs' },
  { name = 'furo', feature = 'docs' },
  { name = 'myst-parser', feature = 'docs' },
  { name = 'sphinx', feature = 'docs' },
  { name = 'sphinx-notfound-page', feature = 'docs' },
  { name = 'sphinxcontrib-towncrier', feature = 'docs' },
  { name = 'towncrier', version = '<24.7', feature = 'docs' },
  { name = 'cloudpickle', marker = 'platform_python_implementation == "CPython"', feature = 'tests' },
  { name = 'hypothesis', feature = 'tests' },
  { name = 'mypy', version = '>=1.11.1', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9"', feature = 'tests' },
  { name = 'pympler', feature = 'tests' },
  { name = 'pytest-mypy-plugins', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9" and python_version < "3.13"', feature = 'tests' },
  { name = 'pytest-xdist', extras = ['psutil'], feature = 'tests' },
  { name = 'pytest', version = '>=4.3.0', feature = 'tests' },
  { name = 'mypy', version = '>=1.11.1', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9"', feature = 'tests-mypy' },
  { name = 'pytest-mypy-plugins', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9" and python_version < "3.13"', feature = 'tests-mypy' }
]
editable = false
wheels = [
  { tags = ['py3-none-any'], url = 'https://files.pythonhosted.org/packages/6a/21/5b6702a7f963e95456c0de2d495f67bf5fd62840ac655dc451586d23d39a/attrs-24.2.0-py3-none-any.whl', hash = '81921eb96de3191c8258c199618104dd27ac608d9366f5e35d011eae1867ede2', upload_time = 2024-08-06T14:37:36.958006+00:00, size = 63001 }
]

[[packages]]
name = 'cattrs'
version = '24.1.2'
groups = ['Default']
index_url = 'https://pypi.org/simple/cattrs'
direct = false
requires_python = '>=3.8'
dependencies = [
  { name = 'attrs', version = '>=23.1.0' },
  { name = 'exceptiongroup', version = '>=1.1.1', marker = 'python_version < "3.11"' },
  { name = 'typing-extensions', version = '!=4.6.3,>=4.1.0', marker = 'python_version < "3.11"' },
  { name = 'pymongo', version = '>=4.4.0', feature = 'bson' },
  { name = 'cbor2', version = '>=5.4.6', feature = 'cbor2' },
  { name = 'msgpack', version = '>=1.0.5', feature = 'msgpack' },
  { name = 'msgspec', version = '>=0.18.5', marker = 'implementation_name == "cpython"', feature = 'msgspec' },
  { name = 'orjson', version = '>=3.9.2', marker = 'implementation_name == "cpython"', feature = 'orjson' },
  { name = 'pyyaml', version = '>=6.0', feature = 'pyyaml' },
  { name = 'tomlkit', version = '>=0.11.8', feature = 'tomlkit' },
  { name = 'ujson', version = '>=5.7.0', feature = 'ujson' }
]
editable = false
wheels = [
  { tags = ['py3-none-any'], url = 'https://files.pythonhosted.org/packages/c8/d5/867e75361fc45f6de75fe277dd085627a9db5ebb511a87f27dc1396b5351/cattrs-24.1.2-py3-none-any.whl', hash = '67c7495b760168d931a10233f979b28dc04daf853b30752246f4f8471c6d68d0', upload_time = 2024-09-22T14:58:34.812643+00:00, size = 66446 }
]

[[packages]]
name = 'numpy'
version = '2.1.2'
groups = ['Default']
index_url = 'https://pypi.org/simple/numpy'
direct = false
requires_python = '>=3.10'
dependencies = [

]
editable = false
wheels = [
  { tags = ['cp312-cp312-manylinux2014_x86_64', 'cp312-cp312-manylinux_2_17_x86_64'], url = 'https://files.pythonhosted.org/packages/9b/b4/e3c7e6fab0f77fff6194afa173d1f2342073d91b1d3b4b30b17c3fb4407a/numpy-2.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl', hash = '6d95f286b8244b3649b477ac066c6906fbb2905f8ac19b170e2175d3d799f4df', upload_time = 2024-10-05T18:36:20.729642+00:00, size = 16041825 },
  { tags = ['cp312-cp312-win_amd64'], url = 'https://files.pythonhosted.org/packages/4c/79/73735a6a5dad6059c085f240a4e74c9270feccd2bc66e4d31b5ca01d329c/numpy-2.1.2-cp312-cp312-win_amd64.whl', hash = '456e3b11cb79ac9946c822a56346ec80275eaf2950314b249b512896c0d2505e', upload_time = 2024-10-05T18:37:38.159022+00:00, size = 12568254 }
]

Expectations for Lockers

  • Lockers MUST make sure that entering the dependency graph via a specific group will not lead to ambiguity for installers as to which value in [[packages]] to install for any environment (this can be controlled for via packages.version and packages.groups).
  • Lockers SHOULD try to make all logically related groups resolve together (i.e. no ambiguity if grouped together).
  • If a groups.project would have extras that cause ambiguity or installation failure due to conflicts between the extras, the locker MAY create separate groups.requirements entries instead, otherwise the locker MUST raise an error.
  • Lockers MAY try to lock for multiple environments in a single lock file.
  • Lockers MAY try to update a lock file containing [tool] and [packages.tool] for other tools than themselves.
  • Lockers MAY want to provide a way to let users provide the information necessary to lock for other environments, e.g., supporting a JSON file format which specifies wheel tags and marker values.
{
    "marker-values": {"<marker>": "<value>"},
    "wheel-tags": ["<tag>"]
}

Expectations for Installers

  • Installers MAY support installation of non-binary files (i.e. source trees and source distributions), but are not required to.
  • Installers MUST provide a way to avoid non-binary file installation for reproducibility and security purposes.
  • Installers SHOULD make it opt-in to use non-binary file installation to facilitate a secure-by-default approach.
  • If a traversal of the graph leads to any ambiguity as to what package version to install (i.e. more than one package version qualifies), an error MUST be raised.
  • Installers MUST only consider package versions included in any selected groups (i.e. installers cannot consider packages outside of the groups selected to install from).
  • Installers MUST error out if a package version lacks a way to install into the chosen environment.
  • Installers MUST support installing into an empty environment.

Pseudo-Code

class UnsatisfiableError(Exception):
    """Raised when a requirement cannot be satisfied."""


class AmbiguityError(Exception):
    """Raised when a requirement has multiple solutions."""


def install_packages(lock_file_contents):
    # Hard-coded out of laziness.
    packages = choose_packages(lock_file_contents, (GROUP_NAME, frozenset()))

    for package in packages:
        tags = list(packaging.tags.sys_tags())
        for tag in tags:  # Prioritize by tag order.
            tag_str = str(tag)
            for wheel in package["wheels"]:
                if tag_str in wheel["tags"]:
                    break
            else:
                continue
            break
        else:
            raise UnsatisfiableError(
                f"No wheel for {package['name']} {package['version']}"
            )
        print(f"Installing {package['name']} {package['version']} ({tag_str})")


def choose_packages(lock_file_data, *selected_groups):
    """Select the package versions that should be installed based on the requested groups.

    'selected_groups' is a sequence of two-item tuples, representing a group name and
    optionally any requested extras if the group is a project.
    """
    group_names = frozenset(operator.itemgetter(0)(group) for group in selected_groups)
    available_packages = {}  # The packages in the selected groups.
    for pkg in lock_file_data["packages"]:
        if frozenset(pkg["groups"]) & group_names:
            available_packages.setdefault(pkg["name"], []).append(pkg)
    selected_packages = {}  # The package versions that have been selected.
    handled_extras = {}  # The extras that have been handled.
    requirements = []  # A stack of requirements to satisfy.

    # First, get our starting list of requirements.
    for group in selected_groups:
        requirements.extend(gather_requirements(lock_file_data, group))

    # Next, go through the requirements and try to find a **single** package version
    # that satisfies each requirement.
    while requirements:
        req = requirements.pop()
        # Ignore requirements whose markers disqualify it.
        if not applies_to_env(req):
            continue
        name = req["name"]
        if pkg := selected_packages.get(name):
            # Safety check that the cross-section of groups doesn't cause issues.
            # It somewhat assumes the locker didn't mess up such that there would be
            # ambiguity by what package version was initially selected.
            if not version_satisfies(req, pkg):
                raise UnsatisfiableError(
                    f"requirement {req!r} not satisfied by "
                    f"{selected_packages[req['name']]!r}"
                )
            if "extras" not in req:
                continue
            needed_extras = req["extras"]
            if not (extras := handled_extras.set_default(name, set())).difference(
                needed_extras
            ):
                continue
            # This isn't optimal as we may tread over the same extras multiple times,
            # but eventually the maximum set of extras for the package will be handled
            # and thus the above guard will short-circuit adding any more requirements.
            extras.update(needed_extras)
        else:
            # Raises UnsatisfiableError or AmbiguityError if no suitable, single package
            # version is found.
            pkg = compatible_package_version(req, available_packages[req["name"]])
            selected_packages[name] = pkg
        requirements.extend(dependencies(pkg, req))

    return selected_packages.values()


def gather_requirements(locked_file_data, group):
    """Return a collection of all requirements for a group."""
    # Hard-coded to support `groups.requirements` out of laziness.
    group_name, _extras = group
    for group in locked_file_data["groups"]:
        if group["name"] == group_name:
            return group["requirements"]
    else:
        raise ValueError(f"Group {group_name!r} not found in lock file")


def applies_to_env(requirement):
    """Check if the requirement applies to the current environment."""
    try:
        markers = requirement["marker"]
    except KeyError:
        return True
    else:
        return packaging.markers.Marker(markers).evaluate()


def version_satisfies(requirement, package):
    """Check if the package version satisfies the requirement."""
    try:
        raw_specifier = requirement["version"]
    except KeyError:
        return True
    else:
        specifier = packaging.specifiers.SpecifierSet(raw_specifier)
        return specifier.contains(package["version"], prereleases=True)


def compatible_package_version(requirement, available_packages):
    """Return the package version that satisfies the requirement.

    If no package version can satisfy the requirement, raise UnsatisfiableError. If
    multiple package versions can satisfy the requirement, raise AmbiguityError.
    """
    possible_packages = [
        pkg for pkg in available_packages if version_satisfies(requirement, pkg)
    ]
    if not possible_packages:
        raise UnsatisfiableError(f"No package version satisfies {requirement!r}")
    elif len(possible_packages) > 1:
        raise AmbiguityError(f"Multiple package versions satisfy {requirement!r}")
    return possible_packages[0]


def dependencies(package, requirement):
    """Return the dependencies of the package.

    The extras from the requirement will extend the base requirements as needed.
    """
    applicable_deps = []
    extras = frozenset(requirement.get("extras", []))
    for dep in package["dependencies"]:
        if "feature" not in dep or dep["feature"] in extras:
            applicable_deps.append(dep)
    return applicable_deps

Backwards Compatibility

Because there is no preexisting lock file format, there are no explicit backwards-compatibility concerns in terms of Python packaging standards.

As for packaging tools themselves, that will be a per-tool decision. For tools that don’t document their lock file format, they could choose to simply start using the format internally and then transition to saving their lock files with a name supported by this PEP. For tools with a preexisting, documented format, they could provide an option to choose which format to emit.

Security Implications

The hope is that by standardizing on a lock file format that starts from a security-first posture it will help make overall packaging installation safer. However, this PEP does not solve all potential security concerns.

One potential concern is tampering with a lock file. If a lock file is not kept in source control and properly audited, a bad actor could change the file in nefarious ways (e.g. point to a malware version of a package). Tampering could also occur in transit to e.g. a cloud provider who will perform an installation on the user’s behalf. Both could be mitigated by signing the lock file either within the file in a [tool] entry or via a side channel external to the lock file itself.

This PEP does not do anything to prevent a user from installing an incorrect packages. While including many details to help in auditing a package’s inclusion, there isn’t any mechanism to stop e.g. name confusion attacks via typosquatting. Lockers may be able to provide some UX to help with this (e.g. by providing download counts for a package).

How to Teach This

Users should be informed that when they ask to install some package, that package may have its own dependencies, those dependencies may have dependencies, and so on. Without writing down what gets installed as part of installing the package they requested, things could change from underneath them (e.g., package versions). Changes to the underlying dependencies can lead to accidental breakage of their code. Lock files help deal with that by providing a way to write down what was (and should be) installed.

Having what to install written down also helps in collaborating with others. By agreeing to a lock file’s contents, everyone ends up with the same packages installed. This helps make sure no one relies on e.g. an API that’s only available in a certain version that not everyone working on the project has installed.

Lock files also help with security by making sure you always get the same files installed and not a malicious one that someone may have slipped in. It also lets one be more deliberate in upgrading their dependencies and thus making sure the change is on purpose and not one slipped in by a bad actor.

Reference Implementation

A proof-of-concept implementing most of this PEP for wheels can be found at https://github.com/brettcannon/mousebender/tree/pep .

Rejected Ideas

A flat set of packages to install

An earlier version of this PEP proposed to use a flat set of package versions instead of a graph. The idea was that each package version could be evaluated in isolation as to whether it applied to an environment for installation. The hope was that would lend itself to easier auditing as one wouldn’t have to worry about how a package version fit into the graph when looking at e.g., a diff for a lock file.

Unfortunately this was deemed not as flexible as using a graph. For instance, recording the graph assists in dependency analysis for tools like GitHub. A graph also makes following how you ended up with dependencies within your lock file from any point in the graph. It also balances out the implementation costs a bit more between lockers and installers by alleviating the complexity off of lockers a bit for only a minor increase in complexity for installers by involving standard graph-traversing algorithms instead of a linear walk.

And if the dependency graph is already being recorded for the above benefits, then recording that same data in a flattened manner is redundant that makes lock files larger and potentially more unruly.

Specifying a new core metadata version that requires consistent metadata across files

At one point, to handle the issue of metadata varying between files and thus require examining every released file for a package and version for accurate locking results, the idea was floated to introduce a new core metadata version which would require all metadata for all wheel files be the same for a single version of a packages. Ultimately, though, it was deemed unnecessary as this PEP will put pressure on people to make files consistent for performance reasons or to make indexes provide all the metadata separate from the wheel files themselves. As well, there’s no easy enforcement mechanism, and so community expectation would work as well as a new metadata version.

Have the installer do dependency resolution

In order to support a format more akin to how Poetry worked when this PEP was drafted, it was suggested that lockers effectively record the packages and their versions which may be necessary to make an install work in any possible scenario, and then the installer resolves what to install. But that complicates auditing a lock file by requiring much more mental effort to know what packages may be installed in any given scenario. Also, one of the Poetry developers suggested that markers as represented in the package locking approach of this PEP may be sufficient to cover the needs of Poetry. Not having the installer do a resolution also simplifies their implementation, centralizing complexity in lockers.

Requiring specific hash algorithm support

It was proposed to require a baseline hash algorithm for the files. This was rejected as no other Python packaging specification requires specific hash algorithm support. As well, the minimum hash algorithm suggested may eventually become an outdated/unsafe suggestion, requiring further updates. In order to promote using the best algorithm at all times, no baseline is provided to avoid simply defaulting to the baseline in tools without considering the security ramifications of that hash algorithm.

Require a URL or file path for files

Originally references to files were required, e.g., packages.sdist.url or packages.sdist.path. But at least one use-case surfaced during discussions about this PEP where statically specifying the location of files would be problematic. And in earlier discussions the idea of the location being a hint wasn’t preferred. Hence the PEP now makes the data optional, but considers the locations accurate if specified.

File naming

Using *.pylock.toml as the file name

It was proposed to put the pylock constant part of the file name after the identifier for the purpose of the lock file. It was decided not to do this so that lock files would sort together when looking at directory contents instead of purely based on their purpose which could spread them out in a directory.

Using *.pylock as the file name

Not using .toml as the file extension and instead making it .pylock itself was proposed. This was decided against so that code editors would know how to provide syntax highlighting to a lock file without having special knowledge about the file extension.

Not having a naming convention for the file

Having no requirements or guidance for a lock file’s name was considered, but ultimately rejected. By having a standardized naming convention it makes it easy to identify a lock file for both a human and a code editor. This helps facilitate discovery when e.g. a tool wants to know all of the lock files that are available.

File format

Use JSON over TOML

Since having a format that is machine-writable was a goal of this PEP, it was suggested to use JSON. But it was deemed less human-readable than TOML while not improving on the machine-writable aspect enough to warrant the change.

Use YAML over TOML

Some argued that YAML met the machine-writable/human-readable requirement in a better way than TOML. But as that’s subjective and pyproject.toml already existed as the human-writable file used by Python packaging standards it was deemed more important to keep using TOML.

Other keys

Multiple hashes per file

An initial version of this PEP proposed supporting multiple hashes per file. The idea was to allow one to choose which hashing algorithm they wanted to go with when installing. But upon reflection it seemed like an unnecessary complication as there was no guarantee the hashes provided would satisfy the user’s needs. As well, if the single hash algorithm used in the lock file wasn’t sufficient, rehashing the files involved as a way to migrate to a different algorithm didn’t seem insurmountable.

Hashing the contents of the lock file itself

Hashing the contents of the bytes of the file and storing hash value within the file itself was proposed at some point. This was removed to make it easier when merging changes to the lock file as each merge would have to recalculate the hash value to avoid a merge conflict.

Hashing the semantic contents of the file was also proposed, but it would lead to the same merge conflict issue.

Regardless of which contents were hashed, either approach could have the hash value stored outside of the file if such a hash was desired.

Recording the creation date of the lock file

To know how potentially stale the lock file was, an earlier proposal suggested recording the creation date of the lock file. But for some same merge conflict reasons as storing the hash of the file contents, this idea was dropped.

Recording the package indexes used

Recording what package indexes were used by the locker to decide what to lock for was considered. In the end, though, it was rejected as it was deemed unnecessary bookkeeping.

Locking build requirements for sdists

An earlier version of this PEP tried to lock the build requirements for sdists under a packages.build-requires key. Unfortunately it confused enough people about how it was expected to operate and there were enough edge case issues to decide it wasn’t worth trying to do in this PEP upfront. Instead, a future PEP could propose a solution.

Open Issues

Specify requires-python at the file level?

The lock file formats from PDM, Poetry, and uv all specify requires-python at the top level for the absolute minimum Python version needed for the lock file. This can be inferred, though, by examining all packages.requires-python values. The global value might also not be accurate for all platforms depending on how environment markers influence what package versions are installed and what their Python version requirements are.

Don’t pre-parse data?

This PEP currently takes the viewpoint that if a piece of data is going to be parsed by installers everytime they run, then trying to pre-parse as much as possible so the TOML parser can help is a good thing. The thinking is TOML parsers have a higher chance of being optimized, and so letting them do more parsing leads to a faster outcome. It should also increase readability by breaking apart data upfront more.

But in the case of doing this to wheel file names, some might consider it too much. The question becomes whether separating out all the parts of a wheel file name hinders readability because people are used to reading the file names already, or by clearly separating its parts it actually helps make installers faster, easier to write, and doesn’t hinder readability.

This all equally applies to requirement specifiers.

Deferred Ideas

Per-file locking

An earlier version of this PEP supported two approaches to locking: per-file and per-package. The idea for the former approach to locking was that if you were locking for an a-priori set of environments you could lock to just the files necessary to install into those environments. The thinking was that by only listing a subset of files that auditing would be easier.

Unfortunately there was disagreement on how best to express upfront what the supported environment requirements would be. Since what this PEP currently proposes still prevents accidental success of installation into unsupported environments, this idea has been deferred until such time someone can come up with a representation that makes sense.

Allowing for multiple lock files

Before the introduction of [[groups]], this PEP proposed supporting multiple lock files that would match the regular expression r"pylock\.(.+)\.toml" if a name for the lock file is desired or if multiple lock files exist. But since [[groups]] subsumes a lot of the need to support multiple lock files, this specific feature can be postponed until such time that a need is shown to support multiple lock files.

Acknowledgements

Thanks to everyone who participated in the discussions on discuss.python.org. Also thanks to Randy Döring, Seth Michael Larson, Paul Moore, and Ofek Lev for providing feedback on a draft version of this PEP.


Source: https://github.com/python/peps/blob/main/peps/pep-0751.rst

Last modified: 2024-11-05 19:18:47 GMT