PEP: 751 Title: A file format to list Python dependencies for
installation reproducibility Author: Brett Cannon <brett@python.org>
Discussions-To: https://discuss.python.org/t/59173 Status: Draft Type:
Standards Track Topic: Packaging Created: 24-Jul-2024 Post-History:
25-Jul-2024 Replaces: 665

Abstract

This PEP proposes a new file format for dependency specification to
enable reproducible installation in a Python environment. The format is
designed to be human-readable and machine-generated. Installers
consuming the file should be able to evaluate each package in question
in isolation, with no need for dependency resolution at install-time.

Motivation

Currently, no standard exists to:

-   Specify what top-level dependencies should be installed into a
    Python environment.
-   Create an immutable record, such as a lock file, of which
    dependencies were installed.

Considering there are at least five well-known solutions to this problem
in the community (pip freeze, pip-tools, uv, Poetry, and PDM), there
seems to be an appetite for lock files in general.

Those tools also vary in what locking scenarios they support. For
instance, pip freeze and pip-tools only generate lock files for the
current environment while PDM and Poetry try to lock for any environment
to some degree. And none of them directly support locking to specific
files to install which can be important for some workflows. There's also
concerns around the lack of secure defaults in the face of supply chain
attacks (e.g., always including hashes for files). Finally, not all the
formats are easy to audit to determine what would be installed into an
environment ahead of time.

The lack of a standard also has some drawbacks. For instance, any
tooling that wants to work with lock files must choose which format to
support, potentially leaving users unsupported (e.g., Dependabot only
supporting select tools, same for cloud providers who can do dependency
installations on your behalf, etc.). It also impacts portability between
tools, which causes vendor lock-in. By not having compatibility and
interoperability it fractures tooling around lock files where both users
and tools have to choose what lock file format to use upfront and making
it costly to use/switch to other formats. Rallying around a single
format removes that cost/barrier.

Note

Much of the motivation from PEP 665 also applies to this PEP.

Rationale

The format is designed so that a locker which produces the lock file and
an installer which consumes the lock file can be separate tools. This
allows for situations such as cloud hosting providers to use their own
installer that's optimized for their system which is independent of what
locker the user used to create their lock file.

The file format is designed to be human-readable. This is so that the
contents of the file can be audited by a human to make sure no undesired
dependencies end up being included in the lock file. It is also designed
to facilitate easy understanding of what would be installed from the
lock file without necessitating running a tool, once again to help with
auditing. Finally, the format is designed so that viewing a diff of the
file is easy by centralizing relevant details.

The file format is also designed to not require a resolver at install
time. Being able to analyze dependencies in isolation from one another
when listed in a lock file provides a few benefits. First, it supports
auditing by making it easy to figure out if a certain dependency would
be installed for a certain environment without needing to reference
other parts of the file contextually. It should also lead to faster
installs which are much more frequent than creating a lock file.
Finally, the four tools mentioned in the Motivation section either
already implement this approach of evaluating dependencies in isolation
or have suggested they could (in Poetry's case).

Locking Scenarios

The lock file format is designed to support two locking scenarios. The
format should also be flexible enough that adding support for other
locking scenarios is possible via a separate PEP.

Per-file Locking

Per-file locking operates under the premise that one wants to install
exactly the same files in any matching environment. As such, the lock
file specifies what files to install. There can be multiple environments
specified in a single file, each with their own set of files to install.
By specifying the exact files to install, installers avoid performing
any resolution to decide what to install.

The motivation for this approach to locking is for those who have
controlled environments that they work with. For instance, if you have
specific, controlled development and production environments then you
can use per-file locking to make sure the same files are installed in
both environments for everyone. This is similar to what pip freeze and
pip-tools support, but with more strictness of the exact files as well
as incorporating support to specify the locked files for multiple
environments in the same file.

Per-file locking should be used when the installation attempt should
fail outright if there is no explicitly pre-approved set of installation
artifacts for the target platform. For example: locking the deployment
dependencies for a managed web service.

Package Locking

Package locking lists the packages and their versions that may apply to
any environment being installed for. The list of packages and their
versions are evaluated individually and independently from any other
packages and versions listed in the file. This allows installation to be
linear -- read each package and version and make an isolated decision as
to whether it should be installed. This avoids requiring the installer
to perform a resolution (i.e. determine what to install based on what
else is to be installed).

The motivation of this approach comes from PDM lock files. By listing
the potential packages and versions that may be installed, what's
installed is controlled in a way that's easy to reason about. This also
allows for not specifying the exact environments that would be supported
by the lock file so there's more flexibility for what environments are
compatible with the lock file. This approach supports scenarios like
open-source projects that want to lock what people should use to build
the documentation without knowing upfront what environments their
contributors are working from.

As already mentioned, this approach is supported by PDM. Poetry has
shown some interest.

Per-package locking should be used when the exact set of potential
target platforms is not known when generating the lock file, as it
allows installation tools to choose the most appropriate artifacts for
each platform from the pre-approved set. For example: locking the
development dependencies for an open source project.

Specification

File Name

A lock file MUST be named pylock.toml or match the regular expression
r"pylock\.(.+)\.toml" if a name for the lock file is desired or if
multiple lock files exist. The use of the .toml file extension is to
make syntax highlighting in editors easier and to reinforce the fact
that the file format is meant to be human-readable. The prefix and
suffix of a named file MUST be lowercase for easy detection and
stripping off to find the name, e.g.:

    if filename.startswith("pylock.") and filename.endswith(".toml"):
        name = filename.removeprefix("pylock.").removesuffix(".toml")

This PEP has no opinion as to the location of lock files (i.e. in the
root or the subdirectory of a project).

File Format

The format of the file is TOML.

All keys listed below are required unless otherwise noted. If two keys
are mutually exclusive to one another, then one of the keys is required
while the other is disallowed.

Keys in tables -- including the top-level table -- SHOULD be emitted by
lockers in the order they are listed in this PEP when applicable unless
another sort order is specified to minimize noise in diffs. If the keys
are not explicitly specified in this PEP, then the keys SHOULD be sorted
by lexicographic order.

As well, lockers SHOULD sort arrays in lexicographic order unless
otherwise specified for the same reason.

version

-   String
-   The version of the lock file format.
-   This PEP specifies the initial version -- and only valid value until
    future updates to the standard change it -- as "1.0".
-   If an installer supports the major version but not the minor
    version, a tool SHOULD warn when an unknown key is seen.
-   If an installer doesn't support a major version, it MUST raise an
    error.

hash-algorithm

-   String
-   The name of the hash algorithm used for calculating all hash values.
-   Only a single hash algorithm is used for the entire file to allow
    the [[packages.files]] table to be written inline for readability
    and compactness purposes by only listing a single hash value instead
    of multiple values based on multiple hash algorithms.
-   Specifying a single hash algorithm guarantees that an algorithm that
    the user prefers is used consistently throughout the file without
    having to audit each file hash value separately.
-   Allows for updating the entire file to a new hash algorithm without
    running the risk of accidentally leaving an old hash value in the
    file.
-   packaging:simple-repository-api-json and the hashes dictionary of of
    the files dictionary of the Project Details dictionary specifies
    what values are valid and guidelines on what hash algorithms to use.
-   Failure to validate any hash values for any file that is to be
    installed MUST raise an error.

dependencies

-   Array of strings
-   A listing of the dependency specifiers that act as the input to the
    lock file, representing the direct, top-level dependencies to be
    installed.

[[file-locks]]

-   Array of tables
-   Mutually exclusive with [package-lock].
-   The array's existence implies the use of the per-file locking
    approach.
-   An environment that meets all of the specified criteria in the table
    will be considered compatible with the environment that was locked
    for.
-   Lockers MUST NOT generate multiple [file-locks] tables which would
    be considered compatible for the same environment.
-   In instances where there would be a conflict but the lock is still
    desired, either separate lock files can be written or per-package
    locking can be used.
-   Entries in array SHOULD be sorted by file-locks.name
    lexicographically.

file-locks.name

-   String
-   A unique name within the array for the environment this table
    represents.

[file-locks.marker-values]

-   Optional
-   Table of strings
-   The keys represent the names of environment markers and the values
    are the values for those markers.
-   Compatibility is defined by the environment's values matching what
    is in the table.

file-locks.wheel-tags

-   Optional
-   Array of strings
-   An unordered array of wheel tags for which all tags must be
    supported by the environment.
-   The array MAY not be exhaustive to allow for a smaller array as well
    as to help prevent multiple [[file-locks]] tables being compatible
    with the same environment by having one array being a strict subset
    of another file-locks.wheel-tags entry in the same file's
    [[file-locks]] tables.
-   Lockers MUST NOT include compressed tag sets or duplicate tags for
    consistency across lockers and to simplify checking for
    compatibility.

[package-lock]

-   Table
-   Mutually exclusive with [[file-locks]].
-   Signifies the use of the package locking approach.

package-lock.requires-python

-   String
-   Holds the version specifiers for Python version compatibility for
    the overall package locking.
-   Provides at-a-glance information to know if the lock file may apply
    to a version of Python instead of having to scan the entire file to
    compile the same information.

[[packages]]

-   Array of tables
-   The array contains all data on the locked package versions.
-   Lockers SHOULD record packages in order by packages.name
    lexicographically , packages.version by the sort order for version
    specifiers, and packages.markers lexicographically.
-   Lockers SHOULD record keys in the same order as written in this PEP
    to minimize changes when updating.
-   Entries are designed so that relevant details as to why a package is
    included are in one place to make diff reading easier.

packages.name

-   String
-   The normalized name of the packages.
-   Part of what's required to uniquely identify this entry.

packages.version

-   String
-   The version of the packages.
-   Part of what's required to uniquely identify this entry.

packages.multiple-entries

-   Optional (defaults to false)
-   Boolean
-   If package locking via [package-lock], then the multiple entries for
    the same package MUST be mutually exclusive via packages.marker
    (this is not required for per-file locking as the packages.*.lock
    entries imply mutual exclusivity).
-   Aids in auditing by knowing that there are multiple entries for the
    same package that may need to be considered.

packages.description

-   Optional
-   String
-   The package's Summary from its core metadata.
-   Useful to help understand why a package was included in the file
    based on its purpose.

packages.index-url

-   Optional (although mutually exclusive with packages.files.index-url)
-   String
-   Stores the project index URL from the Simple Repository API.
-   Useful for generating Packaging URLs (aka PURLs).
-   When possible, lockers SHOULD include this or
    packages.files.index-url to assist with generating software bill of
    materials (aka SBOMs).

packages.marker

-   Optional
-   String
-   The environment markers expression which specifies whether this
    package and version applies to the environment.
-   Only applicable via [package-lock] and the package locking scenario.
-   The lack of this key means this package and version is required to
    be installed.

packages.requires-python

-   Optional
-   String
-   Holds the version specifiers for Python version compatibility for
    the package and version.
-   Useful for documenting why this package and version was included in
    the file.
-   Also helps document why the version restriction in
    package-lock.requires-python was chosen.
-   It should not provide useful information for installers as it would
    be captured by package-lock.requires-python and isn't relevant when
    [[file-locks]] is used.

packages.dependents

-   Optional
-   Array of strings
-   A record of the packages that depend on this package and version.
-   Useful for analyzing why a package happens to be listed in the file
    for auditing purposes.
-   This does not provide information which influences installers.

packages.dependencies

-   Optional
-   Array of strings
-   A record of the dependencies of the package and version.
-   Useful in analyzing why a package happens to be listed in the file
    for auditing purposes.
-   This does not provide information which influences the installer as
    [[file-locks]] specifies the exact files to use and [package-lock]
    applicability is determined by packages.marker.

packages.direct

-   Optional (defaults to false)
-   Boolean
-   Represents whether the installation is via a direct URL reference.

[[packages.files]]

-   Must be specified if [packages.vcs] and [packages.directory] is not
    (although may be specified simultaneously with the other options).
-   Array of tables
-   Tables can be written inline.
-   Represents the files to potentially install for the package and
    version.
-   Entries in [[packages.files]] SHOULD be lexicographically sorted by
    packages.files.name key to minimize changes in diffs.

packages.files.name

-   String
-   The file name.
-   Necessary for installers to decide what to install when using
    package locking.

packages.files.lock

-   Required when [[file-locks]] is used (does not apply under
    per-package locking)
-   Array of strings
-   An array of file-locks.name values which signify that the file is to
    be installed when the corresponding [[file-locks]] table applies to
    the environment.
-   There MUST only be a single file with any one file-locks.name entry
    per package, regardless of version.

packages.files.index-url

-   Optional (although mutually exclusive with packages.index-url)
-   String
-   The value has the same meaning as packages.index-url.
-   This key is available per-file to support PEP 708 when some files
    override what's provided by another Simple Repository API index.

packages.files.url

-   Optional (and mutually exclusive with packages.path)
-   String
-   URL where the file was found when the lock file was generated.
-   Useful for documenting where the file was originally found and
    potentially where to look for the file if it is not already
    downloaded/available.
-   Installers MUST NOT assume the URL will always work, but installers
    MAY use the URL if it happens to work.

packages.path

-   Optional (and mutually exclusive with packages.path)
-   String
-   File system path to where the file was found when the lock file was
    generated.
-   Path may be relative to the lock file's location or absolute.
-   Installers MUST NOT assume the path will always work, but installers
    MAY use the path if it happens to work.

packages.files.hash

-   String
-   The hash value of the file contents using the hash algorithm
    specified by hash-algorithm.
-   Used by installers to verify the file contents match what the locker
    worked with.

[packages.vcs]

-   Must be specified if [[packages.files]] and [packages.directory] is
    not (although may be specified simultaneously with the other
    options).
-   Table representing the version control system containing the package
    and version.

packages.vcs.type

-   String
-   The type of version control system used.
-   The valid values are specified by the registered VCSs of the direct
    URL data structure.

packages.vcs.url

-   Mutually exclusive with packages.vcs.path
-   String
-   The URL of where the repository was located when the lock file was
    generated.

packages.vcs.path

-   Mutually exclusive with packages.vcs.url
-   String
-   The file system path where the repository was located when the lock
    file was generated.
-   The path may be relative to the lock file or absolute.

packages.vcs.commit

-   String
-   The commit ID for the repository which represents the package and
    version.
-   The value MUST be immutable for the VCS for security purposes (e.g.
    no Git tags).

packages.vcs.lock

-   Required when [[file-locks]] is used
-   An array of strings
-   An array of file-locks.name values which signify that the repository
    at the specified commit is to be installed when the corresponding
    [[file-locks]] table applies to the environment.
-   A name in the array may only appear if no file listed in
    packages.files.lock contains the name for the same package,
    regardless of version.

[packages.directory]

-   Must be specified if [[packages.files]] and [packages.vcs] is not
    and doing per-package locking.
-   Table representing a source tree found on the local file system.

packages.directory.path

-   String
-   A local directory where a source tree for the package and version
    exists.
-   The path MUST use forward slashes as the path separator.
-   If the path is relative it is relative to the location of the lock
    file.

packages.directory.editable

-   Boolean
-   Optional (defaults to false)
-   Flag representing whether the source tree should be installed as an
    editable install.

[packages.tool]

-   Optional
-   Table
-   Similar usage as that of the [tool] table from the pyproject.toml
    specification , but at the package version level instead of at the
    lock file level (which is also available via [tool]).
-   Useful for scoping package version/release details (e.g., recording
    signing identities to then use to verify package integrity
    separately from where the package is hosted, prototyping future
    extensions to this file format, etc.).

[tool]

-   Optional
-   Table
-   Same usage as that of the equivalent [tool] table from the
    pyproject.toml specification.

Examples

Per-file locking

    version = '1.0'
    hash-algorithm = 'sha256'
    dependencies = ['cattrs', 'numpy']

    [[file-locks]]
    name = 'CPython 3.12 on manylinux 2.17 x86-64'
    marker-values = {}
    wheel-tags = ['cp312-cp312-manylinux_2_17_x86_64', 'py3-none-any']

    [[file-locks]]
    name = 'CPython 3.12 on Windows x64'
    marker-values = {}
    wheel-tags = ['cp312-cp312-win_amd64', 'py3-none-any']

    [[packages]]
    name = 'attrs'
    version = '23.2.0'
    multiple-entries = false
    description = 'Classes Without Boilerplate'
    requires-python = '>=3.7'
    dependents = ['cattrs']
    dependencies = []
    direct = false
    files = [
        {name = 'attrs-23.2.0-py3-none-any.whl', lock = ['CPython 3.12 on manylinux 2.17 x86-64', 'CPython 3.12 on Windows x64'], url = 'https://files.pythonhosted.org/packages/e0/44/827b2a91a5816512fcaf3cc4ebc465ccd5d598c45cefa6703fcf4a79018f/attrs-23.2.0-py3-none-any.whl', hash = '99b87a485a5820b23b879f04c2305b44b951b502fd64be915879d77a7e8fc6f1'}
    ]

    [[packages]]
    name = 'cattrs'
    version = '23.2.3'
    multiple-entries = false
    description = 'Composable complex class support for attrs and dataclasses.'
    requires-python = '>=3.8'
    dependents = []
    dependencies = ['attrs']
    direct = false
    files = [
        {name = 'cattrs-23.2.3-py3-none-any.whl', lock = ['CPython 3.12 on manylinux 2.17 x86-64', 'CPython 3.12 on Windows x64'], url = 'https://files.pythonhosted.org/packages/b3/0d/cd4a4071c7f38385dc5ba91286723b4d1090b87815db48216212c6c6c30e/cattrs-23.2.3-py3-none-any.whl', hash = '0341994d94971052e9ee70662542699a3162ea1e0c62f7ce1b4a57f563685108'}
    ]

    [[packages]]
    name = 'numpy'
    version = '2.0.1'
    multiple-entries = false
    description = 'Fundamental package for array computing in Python'
    requires-python = '>=3.9'
    dependents = []
    dependencies = []
    direct = false
    files = [
        {name = 'numpy-2.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl', lock = ['cp312-manylinux_2_17_x86_64'], url = 'https://files.pythonhosted.org/packages/2c/f3/61eeef119beb37decb58e7cb29940f19a1464b8608f2cab8a8616aba75fd/numpy-2.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl', hash = '6790654cb13eab303d8402354fabd47472b24635700f631f041bd0b65e37298a'},
        {name = 'numpy-2.0.1-cp312-cp312-win_amd64.whl', lock = ['cp312-win_amd64'], url = 'https://files.pythonhosted.org/packages/b5/59/f6ad30785a6578ad85ed9c2785f271b39c3e5b6412c66e810d2c60934c9f/numpy-2.0.1-cp312-cp312-win_amd64.whl', hash = 'bb2124fdc6e62baae159ebcfa368708867eb56806804d005860b6007388df171'}
    ]

Per-package locking

Some values for packages.files.url left out to make creating this
example more easily as it was done by hand.

    version = '1.0'
    hash-algorithm = 'sha256'
    dependencies = ['cattrs', 'numpy']

    [package-lock]
    requires-python = ">=3.9"


    [[packages]]
    name = 'attrs'
    version = '23.2.0'
    multiple-entries = false
    description = 'Classes Without Boilerplate'
    requires-python = '>=3.7'
    dependents = ['cattrs']
    dependencies = []
    direct = false
    files = [
        {name = 'attrs-23.2.0-py3-none-any.whl', lock = ['cp312-manylinux_2_17_x86_64', 'cp312-win_amd64'], url = 'https://files.pythonhosted.org/packages/e0/44/827b2a91a5816512fcaf3cc4ebc465ccd5d598c45cefa6703fcf4a79018f/attrs-23.2.0-py3-none-any.whl', hash = '99b87a485a5820b23b879f04c2305b44b951b502fd64be915879d77a7e8fc6f1'}
    ]

    [[packages]]
    name = 'cattrs'
    version = '23.2.3'
    multiple-entries = false
    description = 'Composable complex class support for attrs and dataclasses.'
    requires-python = '>=3.8'
    dependents = []
    dependencies = ['attrs']
    direct = false
    files = [
        {name = 'cattrs-23.2.3-py3-none-any.whl', lock = ['cp312-manylinux_2_17_x86_64', 'cp312-win_amd64'], url = 'https://files.pythonhosted.org/packages/b3/0d/cd4a4071c7f38385dc5ba91286723b4d1090b87815db48216212c6c6c30e/cattrs-23.2.3-py3-none-any.whl', hash = '0341994d94971052e9ee70662542699a3162ea1e0c62f7ce1b4a57f563685108'}
    ]

    [[packages]]
    name = 'numpy'
    version = '2.0.1'
    multiple-entries = false
    description = 'Fundamental package for array computing in Python'
    requires-python = '>=3.9'
    dependents = []
    dependencies = []
    direct = false
    files = [
        {name = "numpy-2.0.1-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:6bf4e6f4a2a2e26655717a1983ef6324f2664d7011f6ef7482e8c0b3d51e82ac"},
        {name = "numpy-2.0.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:7d6fddc5fe258d3328cd8e3d7d3e02234c5d70e01ebe377a6ab92adb14039cb4"},
        {name = "numpy-2.0.1-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:5daab361be6ddeb299a918a7c0864fa8618af66019138263247af405018b04e1"},
        {name = "numpy-2.0.1-cp312-cp312-macosx_14_0_x86_64.whl", hash = "sha256:ea2326a4dca88e4a274ba3a4405eb6c6467d3ffbd8c7d38632502eaae3820587"},
        {name = "numpy-2.0.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:529af13c5f4b7a932fb0e1911d3a75da204eff023ee5e0e79c1751564221a5c8"},
        {name = "numpy-2.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:6790654cb13eab303d8402354fabd47472b24635700f631f041bd0b65e37298a"},
        {name = "numpy-2.0.1-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:cbab9fc9c391700e3e1287666dfd82d8666d10e69a6c4a09ab97574c0b7ee0a7"},
        {name = "numpy-2.0.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:99d0d92a5e3613c33a5f01db206a33f8fdf3d71f2912b0de1739894668b7a93b"},
        {name = "numpy-2.0.1-cp312-cp312-win32.whl", hash = "sha256:173a00b9995f73b79eb0191129f2455f1e34c203f559dd118636858cc452a1bf"},
        {name = "numpy-2.0.1-cp312-cp312-win_amd64.whl", hash = "sha256:bb2124fdc6e62baae159ebcfa368708867eb56806804d005860b6007388df171"},
    ]

Expectations for Lockers

-   When creating a lock file for [package-lock], the locker SHOULD read
    the metadata of all files that end up being listed in
    [[packages.files]] to make sure all potential metadata cases are
    covered
-   If a locker chooses not to check every file for its metadata, the
    tool MUST either provide the user with the option to have all files
    checked (whether that is opt-in or out is left up to the tool), or
    the user is somehow notified that such a standards-violating
    shortcut is being taken (whether this is by documentation or at
    runtime is left to the tool)
-   Lockers MAY want to provide a way to let users provide the
    information necessary to install for multiple environments at once
    when doing per-file locking, e.g. supporting a JSON file format
    which specifies wheel tags and marker values much like in
    [[file-locks]] for which multiple files can be specified, which
    could then be directly recorded in the corresponding [[file-locks]]
    table (if it allowed for unambiguous per-file locking environment
    selection)

    {
        "marker-values": {"<marker>": "<value>"},
        "wheel-tags": ["<tag>"]
    }

Expectations for Installers

-   Installers MAY support installation of non-binary files (i.e. source
    distributions, source trees, and VCS), but are not required to.
-   Installers MUST provide a way to avoid non-binary file installation
    for reproducibility and security purposes.
-   Installers SHOULD make it opt-in to use non-binary file installation
    to facilitate a secure-by-default approach.
-   Under per-file locking, if what to install is ambiguous then the
    installer MUST raise an error.

Installing for per-file locking

-   If no compatible environment is found an error MUST be raised.
-   If multiple environments are found to be compatible then an error
    MUST be raised.
-   If a [[packages.files]] contains multiple matching entries an error
    MUST be raised due to ambiguity for what is to be installed.
-   If multiple [[packages]] entries for the same package have matching
    files an error MUST be raised due to ambiguity for what is to be
    installed.

Example workflow

-   Iterate through each [[file-locks]] table to find the one that
    applies to the environment being installed for.
-   If no compatible environment is found an error MUST be raised.
-   If multiple environments are found to be compatible then an error
    MUST be raised.
-   For the compatible environment, iterate through each entry in
    [[packages]].
-   For each [[packages]] entry, iterate through [[packages.files]] to
    look for any files with file-locks.name listed in
    packages.files.lock.
-   If a file is found with a matching lock name, add it to the list of
    candidate files to install and move on to the next [[packages]]
    entry.
-   If no file is found then check if packages.vcs.lock contains a match
    (no match is also acceptable).
-   If a [[packages.files]] contains multiple matching entries an error
    MUST be raised due to ambiguity for what is to be installed.
-   If multiple [[packages]] entries for the same package have matching
    files an error MUST be raised due to ambiguity for what is to be
    installed.
-   Find and verify the candidate files and/or VCS entries based on
    their hash or commit ID as appropriate.
-   Install the candidate files.

Installing for package locking

-   Verify that the environment is compatible with
    package-lock.requires-python; if it isn't an error MUST be raised.
-   If no way to install a required package is found, an error MUST be
    raised.

Example workflow

-   Verify that the environment is compatible with
    package-lock.requires-python; if it isn't an error MUST be raised.
-   Iterate through each entry in [packages]].
-   For each entry, if there's a packages.marker key, evaluate the
    expression.
    -   If the expression is false, then move on.
    -   Otherwise the package entry must be installed somehow.
-   Iterate through the files listed in [[packages.files]], looking for
    the "best" file to install.
-   If no file is found, check for [packages.vcs].
-   It no VCS is found, check for packages.directory.
-   If no match is found, an error MUST be raised.
-   Find and verify the selected files and/or VCS entries based on their
    hash or commit ID as appropriate.
-   Install the selected files.

Backwards Compatibility

Because there is no preexisting lock file format, there are no explicit
backwards-compatibility concerns in terms of Python packaging standards.

As for packaging tools themselves, that will be a per-tool decision. For
tools that don't document their lock file format, they could choose to
simply start using the format internally and then transition to saving
their lock files with a name supported by this PEP. For tools with a
preexisting, documented format, they could provide an option to choose
which format to emit.

Security Implications

The hope is that by standardizing on a lock file format that starts from
a security-first posture it will help make overall packaging
installation safer. However, this PEP does not solve all potential
security concerns.

One potential concern is tampering with a lock file. If a lock file is
not kept in source control and properly audited, a bad actor could
change the file in nefarious ways (e.g. point to a malware version of a
package). Tampering could also occur in transit to e.g. a cloud provider
who will perform an installation on the user's behalf. Both could be
mitigated by signing the lock file either within the file in a [tool]
entry or via a side channel external to the lock file itself.

This PEP does not do anything to prevent a user from installing an
incorrect packages. While including many details to help in auditing a
package's inclusion, there isn't any mechanism to stop e.g. name
confusion attacks via typosquatting. Lockers may be able to provide some
UX to help with this (e.g. by providing download counts for a package).

How to Teach This

Users should be informed that when they ask to install some package,
that package may have its own dependencies, those dependencies may have
dependencies, and so on. Without writing down what gets installed as
part of installing the package they requested, things could change from
underneath them (e.g. package versions). Changes to the underlying
dependencies can lead to accidental breakage of their code. Lock files
help deal with that by providing a way to write down what was installed.

Having what to install written down also helps in collaborating with
others. By agreeing to a lock file's contents, everyone ends up with the
same packages installed. This helps make sure no one relies on e.g. an
API that's only available in a certain version that not everyone working
on the project has installed.

Lock files also help with security by making sure you always get the
same files installed and not a malicious one that someone may have
slipped in. It also lets one be more deliberate in upgrading their
dependencies and thus making sure the change is on purpose and not one
slipped in by a bad actor.

Reference Implementation

A rough proof-of-concept for per-file locking can be found at
https://github.com/brettcannon/mousebender/tree/pep. An example lock
file can be seen at
https://github.com/brettcannon/mousebender/blob/pep/pylock.example.toml.

For per-package locking, PDM indirectly proves the approach works as
this PEP maintains equivalent data as PDM does for its lock files (whose
format was inspired by Poetry). Some of the details of PDM's approach
are covered in https://frostming.com/en/2024/pdm-lockfile/ and
https://frostming.com/en/2024/pdm-lock-strategy/.

Rejected Ideas

Only support package locking

At one point it was suggested to skip per-file locking and only support
package locking as the former was not explicitly supported in the larger
Python ecosystem while the latter was. But because this PEP has taken
the position that security is important and per-file locking is the more
secure of the two options, leaving out per-file locking was never
considered.

Specifying a new core metadata version that requires consistent metadata across files

At one point, to handle the issue of metadata varying between files and
thus require examining every released file for a package and version for
accurate locking results, the idea was floated to introduce a new core
metadata version which would require all metadata for all wheel files be
the same for a single version of a packages. Ultimately, though, it was
deemed unnecessary as this PEP will put pressure on people to make files
consistent for performance reasons or to make indexes provide all the
metadata separate from the wheel files themselves. As well, there's no
easy enforcement mechanism, and so community expectation would work as
well as a new metadata version.

Have the installer do dependency resolution

In order to support a format more akin to how Poetry worked when this
PEP was drafted, it was suggested that lockers effectively record the
packages and their versions which may be necessary to make an install
work in any possible scenario, and then the installer resolves what to
install. But that complicates auditing a lock file by requiring much
more mental effort to know what packages may be installed in any given
scenario. Also, one of the Poetry developers suggested that markers as
represented in the package locking approach of this PEP may be
sufficient to cover the needs of Poetry. Not having the installer do a
resolution also simplifies their implementation, centralizing complexity
in lockers.

Requiring specific hash algorithm support

It was proposed to require a baseline hash algorithm for the files. This
was rejected as no other Python packaging specification requires
specific hash algorithm support. As well, the minimum hash algorithm
suggested may eventually become an outdated/unsafe suggestion, requiring
further updates. In order to promote using the best algorithm at all
times, no baseline is provided to avoid simply defaulting to the
baseline in tools without considering the security ramifications of that
hash algorithm.

File naming

Using *.pylock.toml as the file name

It was proposed to put the pylock constant part of the file name after
the identifier for the purpose of the lock file. It was decided not to
do this so that lock files would sort together when looking at directory
contents instead of purely based on their purpose which could spread
them out in a directory.

Using *.pylock as the file name

Not using .toml as the file extension and instead making it .pylock
itself was proposed. This was decided against so that code editors would
know how to provide syntax highlighting to a lock file without having
special knowledge about the file extension.

Not having a naming convention for the file

Having no requirements or guidance for a lock file's name was
considered, but ultimately rejected. By having a standardized naming
convention it makes it easy to identify a lock file for both a human and
a code editor. This helps facilitate discovery when e.g. a tool wants to
know all of the lock files that are available.

File format

Use JSON over TOML

Since having a format that is machine-writable was a goal of this PEP,
it was suggested to use JSON. But it was deemed less human-readable than
TOML while not improving on the machine-writable aspect enough to
warrant the change.

Use YAML over TOML

Some argued that YAML met the machine-writable/human-readable
requirement in a better way than TOML. But as that's subjective and
pyproject.toml already existed as the human-writable file used by Python
packaging standards it was deemed more important to keep using TOML.

Other keys

Multiple hashes per file

An initial version of this PEP proposed supporting multiple hashes per
file. The idea was to allow one to choose which hashing algorithm they
wanted to go with when installing. But upon reflection it seemed like an
unnecessary complication as there was no guarantee the hashes provided
would satisfy the user's needs. As well, if the single hash algorithm
used in the lock file wasn't sufficient, rehashing the files involved as
a way to migrate to a different algorithm didn't seem insurmountable.

Hashing the contents of the lock file itself

Hashing the contents of the bytes of the file and storing hash value
within the file itself was proposed at some point. This was removed to
make it easier when merging changes to the lock file as each merge would
have to recalculate the hash value to avoid a merge conflict.

Hashing the semantic contents of the file was also proposed, but it
would lead to the same merge conflict issue.

Regardless of which contents were hashed, either approach could have the
hash value stored outside of the file if such a hash was desired.

Recording the creation date of the lock file

To know how potentially stale the lock file was, an earlier proposal
suggested recording the creation date of the lock file. But for some
same merge conflict reasons as storing the hash of the file contents,
this idea was dropped.

Recording the package indexes used

Recording what package indexes were used by the locker to decide what to
lock for was considered. In the end, though, it was rejected as it was
deemed unnecessary bookkeeping.

Locking build requirements for sdists

An earlier version of this PEP tried to lock the build requirements for
sdists under a packages.build-requires key. Unfortunately it confused
enough people about how it was expected to operate and there were enough
edge case issues to decide it wasn't worth trying to do in this PEP
upfront. Instead, a future PEP could propose a solution.

Open Issues

N/A

Acknowledgements

Thanks to everyone who participated in the discussions in
https://discuss.python.org/t/lock-files-again-but-this-time-w-sdists/46593/,
especially Alyssa Coghlan who probably caused the biggest structural
shifts from the initial proposal.

Also thanks to Randy Döring, Seth Michael Larson, Paul Moore, and Ofek
Lev for providing feedback on a draft version of this PEP.

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.