Following system colour scheme Selected dark colour scheme Selected light colour scheme

Python Enhancement Proposals

PEP 751 – A file format to record Python dependencies for installation reproducibility

Author:
Brett Cannon <brett at python.org>
Status:
Draft
Type:
Standards Track
Topic:
Packaging
Created:
24-Jul-2024
Post-History:
25-Jul-2024 30-Oct-2024 15-Jan-2025
Replaces:
665

Table of Contents

Abstract

This PEP proposes a new file format for specifying dependencies to enable reproducible installation in a Python environment. The format is designed to be human-readable and machine-generated. Installers consuming the file should be able to calculate what to install without the need for dependency resolution at install-time.

Motivation

Currently, no standard exists to create an immutable record, such as a lock file, which specifies what direct and indirect dependencies should be installed into a virtual environment.

Considering there are at least five well-known solutions to this problem in the community (PDM, pip freeze, pip-tools, Poetry, and uv), there seems to be an appetite for lock files in general.

Those tools also vary in what locking scenarios they support. For instance, pip freeze and pip-tools only generate lock files for the current environment while PDM, Poetry, and uv can/try to lock for multiple environments at once. There’s also concerns around the lack of secure defaults in the face of supply chain attacks (e.g. including hashes for files).

The lack of a standard also has some drawbacks. For instance, any tooling that wants to work with lock files must choose which format to support, potentially leaving users unsupported (e.g. Dependabot only supporting select tools, same for cloud providers who can do dependency installations on your behalf, etc.). It also impacts portability between tools, which causes vendor lock-in. By not having compatibility and interoperability it fractures tooling around lock files where both users and tools have to choose what lock file format to use upfront and making it costly to use/switch to other formats (e.g. tooling around auditing a lock file). Rallying around a single format removes this cost/barrier.

The closest the community has to a standard are pip’s requirements files which all the aforementioned tools either use directly as their file format or export to (i.e. requirements.txt). Unfortunately, the format is not a standard but is supported by convention. It’s also designed very much for pip’s needs, limiting its flexibility and ease of use (e.g. it’s a bespoke file format). Lastly, it is not secure by default (e.g. file hash support is entirely an opt-in feature, you have to tell pip to not look for other dependencies outside of what’s in the requirements file, etc.).

Note

Much of the motivation from PEP 665 also applies to this PEP.

Rationale

The file format proposed by this PEP is designed to be human-readable. This is so that the contents of the file can be audited by a human to make sure no undesired dependencies end up being included in the lock file.

The file format is also designed to not require a resolver at install time. This greatly simplifies reasoning about what would be installed when consuming a lock file. It should also lead to faster installs which are much more frequent than creating a lock file.

The data in the file should be consumable by tools not written in Python. This allows for e.g. cloud hosting providers to write their own tool to perform installations in their preferred programming language.

The file format should promote good security defaults. As the format is not meant to be human-writable, this means having tools provide security-related details is reasonable and not a costly burden.

The contents of a lock file should be able to replace the vast majority of uses of requirements files when used as a lock file (e.g. what pip-tools and pip freeze emit). This means the file format specified by this PEP can, at minimum, act as an export target for tools which have their own internal lock file format.

Specification

File Name

A lock file MUST be named pylock.toml or match the regular expression r"^pylock\.([^.]+)\.toml$" if a name for the lock file is desired or if multiple lock files exist. The use of the .toml file extension is to make syntax highlighting in editors easier and to reinforce the fact that the file format is meant to be human-readable. The prefix and suffix of a named file MUST be lowercase when possible, for easy detection and removal, e.g.:

if len(filename) > 11 and filename.startswith("pylock.") and filename.endswith(".toml"):
    name = filename.removeprefix("pylock.").removesuffix(".toml")

The expectation is that services that install lock files automatically will search for a lock file with the service’s name, then fallback to the generic pylock.toml (e.g. a cloud host service named Spam would first look for pylock.spam.toml to install, and if that file didn’t exist then install from pylock.toml).

The lock file(s) SHOULD be located in the directory as appropriate for the scope of the lock file. Locking against a single pyproject.toml, for instance, would place the pylock.toml in the same directory. If the lock file covered multiple projects in a monorepo, then the expectation is the pylock.toml file would be in the directory that held all the projects being locked.

File Format

The format of the file is TOML.

Tools SHOULD write their lock files in a consistent way to minimize noise in diff output. Keys in tables – including the top-level table – SHOULD be recorded in a consistent order (if inspiration is desired, this PEP has tried to write down keys in a logical order). As well, tools SHOULD sort arrays in consistent order. Usage of inline tables SHOULD also be kept consistent.

lock-version

  • Type: string; value of "1.0"
  • Required?: yes
  • Inspiration: Metadata-Version
  • Record the file format version that the file adheres to.
  • This PEP specifies the initial version – and only valid value until future updates to the standard change it – as "1.0".
  • If a tool supports the major version but not the minor version, a tool SHOULD warn when an unknown key is seen.
  • If an tool doesn’t support a major version, it MUST raise an error.

environments

  • Type: Array of strings
  • Required?: no
  • Inspiration: uv
  • A list of Environment Markers for which the lock file is considered compatible with.
  • Tools SHOULD write exclusive/non-overlapping environment markers to ease in understanding.

requires-python

  • Type: string
  • Required?: no
  • Inspiration: PDM, Poetry, uv
  • Specifies the Requires-Python for the minimum Python version compatible for any environment supported by the lock file (i.e. the minimum viable Python version for the lock file).

[[packages]]

  • Type: array of tables
  • Required?: yes
  • Inspiration: PDM, Poetry, uv
  • An array containing all packages that may be installed.
  • Packages MAY be listed multiple times with varying data, but all packages to be installed MUST narrow down to a single entry at install time.

created-by

  • Type: string
  • Required?: yes
  • Inspiration: Tools with their name in their lock file name
  • Records the name of the tool used to create the lock file.
  • Tools MAY use the [tool] table to record enough details that it can be inferred what inputs were used to create the lock file.
  • Tools SHOULD record the normalized name of the tool if it is available as a Python package to facilitate finding the tool.
packages.name
  • Type: string
  • Required?: yes
  • Inspiration: Name
  • The name of the package normalized.
packages.version
  • Type: string
  • Required?: no
  • Inspiration: Version
  • The version of the package.
  • The version SHOULD be specified when the version is known to be stable (i.e. when an sdist or wheels are specified).
  • The version MUST NOT be included when it cannot be guaranteed to be consistent with the code used (i.e. when a source tree is used).
packages.marker
  • Type: string
  • Required?: no
  • Inspiration: PDM
  • The environment marker which specify when the package should be installed.
packages.requires-python
[[packages.dependencies]]
  • Type: array of tables
  • Required?: no
  • Inspiration: PDM, Poetry, uv
  • Records the other entries in [[packages]] which are direct dependencies of this package.
  • Each entry is a table which contains the minimum information required to tell which other package entry it corresponds to where doing a key-by-key comparison would find the appropriate package with no ambiguity (e.g. if there are two entries for the spam package, then you can include the version number like {name = "spam", version = "1.0.0"}, or by source like {name = "spam", vcs = { url = "..."}).
  • Tools MUST NOT use this information when doing installation; it is purely informational for auditing purposes.
packages.direct
[packages.vcs]
  • Type: table
  • Required?: no; mutually-exclusive with packages.directory, packages.archive, packages.sdist, and packages.wheels
  • Inspiration: Direct URL Data Structure
  • Record the version control system details for the source tree it contains.
  • Tools MAY choose to not support version control systems, both from a locking and/or installation perspective.
  • Tools SHOULD provide a way for users to opt in/out of using version control systems.
packages.vcs.type
  • Type: string; supported values specified in Registered VCS
  • Required?: yes
  • Inspiration: VCS URLs
  • The type of version control system used.
packages.vcs.url
  • Type: string
  • Required?: if path is not specified
  • Inspiration: VCS URLs
  • The URL to the source tree.
packages.vcs.path
  • Type: string
  • Required?: if url is not specified
  • Inspiration: VCS URLs
  • The path to the local directory of the source tree.
  • If a relative path is used it MUST be relative to the location of this file.
  • If the path is relative it MAY use POSIX-style path separators explicitly for portability.
packages.vcs.requested-revision
  • Type: string
  • Required?: no
  • Inspiration: VCS URLs
  • The branch/tag/ref/commit/revision/etc. that the user requested.
  • This is purely informational and to facilitate writing the Direct URL Data Structure; it MUST NOT be used to checkout the repository.
packages.vcs.commit-id
  • Type: string
  • Required?: yes
  • Inspiration: VCS URLs
  • The exact commit/revision number that is to be installed.
  • If the VCS supports commit-hash based revision identifiers, such a commit-hash MUST be used as the commit ID in order to reference an immutable version of the source code.
packages.vcs.subdirectory
  • Type: string
  • Required?: no
  • Inspiration: Projects in subdirectories
  • The subdirectory within the source tree where the project root of the project is (e.g. the location of the pyproject.toml file).
  • The path MUST be relative to the root of the source tree structure.
[packages.directory]
  • Type: table
  • Required?: no; mutually-exclusive with packages.vcs, packages.archive, packages.sdist, and packages.wheels
  • Inspiration: Local directories
  • Record the local directory details for the source tree it contains.
  • Tools MAY choose to not support local directories, both from a locking and/or installation perspective.
  • Tools SHOULD provide a way for users to opt in/out of using local directories.
packages.directory.path
  • Type: string
  • Required?: yes
  • Inspiration: Local directories
  • The local directory where the source tree is.
  • If the path is relative it MUST be relative to the location of the lock file.
  • If the path is relative it MAY use POSIX-style path separators for portability.
packages.directory.editable
  • Type: boolean
  • Required?: no; defaults to false
  • Inspiration: Local directories
  • A flag representing whether the source tree should be installed as editable.
packages.directory.subdirectory

See packages.vcs.subdirectory.

[packages.archive]
  • Type: table
  • Required?: no
  • Inspiration: Archive URLs
  • An archive file containing a Source trees.
  • Tools MAY choose to not support archive files, both from a locking and/or installation perspective.
  • Tools SHOULD provide a way for users to opt in/out of using archive files.
packages.archive.url

See packages.vcs.url.

packages.archive.path

See packages.vcs.path.

packages.archive.size
  • Type: integer
  • Required?: no
  • Inspiration: uv, Simple repository API
  • The size of the archive file.
  • Tools SHOULD provide the file size when reasonably possible (e.g. the file size is available via the Content-Length header from a HEAD HTTP request).
[packages.archive.hashes]
  • Type: Table of strings
  • Required?: yes
  • Inspiration: PDM, Poetry, uv, Simple repository API
  • A table listing known hash values of the file where the key is the hash algorithm and the value is the hash value.
  • The table MUST contain at least one entry.
  • Hash algorithm keys SHOULD be lowercase.
  • At least one secure algorithm from hashlib.algorithms_guaranteed SHOULD always be included (at time of writing, sha256 specifically is recommended.
packages.archive.subdirectory

See packages.vcs.subdirectory.

packages.index
  • Type: string
  • Required?: no
  • Inspiration: uv
  • The base URL for the package index from Simple repository API where the sdist and/or wheels were found (e.g. https://pypi.org/simple/).
  • When possible, this SHOULD be specified to assist with generating software bill of materials – aka SBOMs – and to assist in finding a file if a URL ceases to be valid.
  • Tools MAY support installing from an index if the URL recorded for a specific file is no longer vaild (e.g. returns a 404 HTTP error code).
[packages.sdist]
  • Type: table
  • Required?: no; mutually-exclusive with packages.vcs, packages.directory, and packages.archive
  • Inspiration: uv
  • Details of a Source distribution file name for the package.
  • Tools MAY choose to not support sdist files, both from a locking and/or installation perspective.
  • Tools SHOULD provide a way for users to opt in/out of using sdist files.
packages.sdist.name
packages.sdist.upload-time
  • Type: datetime
  • Required?: no
  • Inspiration: Simple repository API
  • The time the file was uploaded.
  • The date and time MUST be recorded in UTC.
packages.sdist.url

See packages.archive.url.

packages.sdist.path

See packages.archive.path.

packages.sdist.size

See packages.archive.size.

packages.sdist.hashes

See packages.archive.hashes.

[[packages.wheels]]
  • Type: array of tables
  • Required?: no; mutually-exclusive with packages.vcs, packages.directory, and packages.archive
  • Inspiration: PDM, Poetry, uv
  • For recording the wheel files as specified by Binary distribution format for the package.
  • Tools MUST support wheel files, both from a locking and installation perspective.
packages.wheels.name
packages.wheels.upload-time

See packages.sdist.upload-time.

packages.wheels.url

See packages.archive.url.

packages.wheels.path

See packages.archive.path.

packages.wheels.size

See packages.archive.size.

packages.wheels.hashes

See packages.archive.hashes.

[[packages.attestation-identities]]
  • Type: array of tables
  • Required?: no
  • Inspiration: Provenance objects
  • A recording of the attestations for any file recorded for this package.
  • If available, tools SHOULD include the attestation identities found.
  • Publisher-specific keys are to be included in the table as-is (i.e. top-level), following the spec at Index hosted attestations.
packages.attestation-identites.kind
  • Type: string
  • Required?: yes
  • Inspiration: Provenance objects
  • The unique identity of the Trusted Publisher.
[packages.tool]

[tool]

Example

metadata-version = "1.0"
requires-python = ">=3.9"
created-by = "PEP 751"

[[packages]]
name = "attrs"
version = "23.2.0"
requires-python = ">=3.7"
index = "https://pypi.org/simple/"
wheels = [
    {name = "attrs-23.2.0-py3-none-any.whl", upload-time = 2023-12-31T06:30:30.772444Z, url = "https://files.pythonhosted.org/packages/e0/44/827b2a91a5816512fcaf3cc4ebc465ccd5d598c45cefa6703fcf4a79018f/attrs-23.2.0-py3-none-any.whl", size = 60752, hashes = {sha256 = "99b87a485a5820b23b879f04c2305b44b951b502fd64be915879d77a7e8fc6f1"} }
]

[[packages]]
name = "cattrs"
version = "23.2.3"
requires-python = ">=3.8"
index = "https://pypi.org/simple/"
wheels = [
    {name = "cattrs-23.2.3-py3-none-any.whl", upload-time = 2023-11-30T22:19:19.163763Z, url = "https://files.pythonhosted.org/packages/b3/0d/cd4a4071c7f38385dc5ba91286723b4d1090b87815db48216212c6c6c30e/cattrs-23.2.3-py3-none-any.whl", size = 57474, hashes = {sha256 = "0341994d94971052e9ee70662542699a3162ea1e0c62f7ce1b4a57f563685108"} }
]

[[packages]]
name = "numpy"
version = "2.0.1"
requires-python = ">=3.9"
index = "https://pypi.org/simple/"
wheels = [
    {name = "numpy-2.0.1-cp312-cp312-macosx_10_9_x86_64.whl", upload-time = 2024-07-21T13:37:15.810939Z, url = "https://files.pythonhosted.org/packages/64/1c/401489a7e92c30db413362756c313b9353fb47565015986c55582593e2ae/numpy-2.0.1-cp312-cp312-macosx_10_9_x86_64.whl", size = 20965374, hashes = {sha256 = "6bf4e6f4a2a2e26655717a1983ef6324f2664d7011f6ef7482e8c0b3d51e82ac"} },
    {name = "numpy-2.0.1-cp312-cp312-macosx_11_0_arm64.whl", upload-time = 2024-07-21T13:37:36.460324Z, url = "https://files.pythonhosted.org/packages/08/61/460fb524bb2d1a8bd4bbcb33d9b0971f9837fdedcfda8478d4c8f5cfd7ee/numpy-2.0.1-cp312-cp312-macosx_11_0_arm64.whl", size = 13102536, hashes = {sha256 = "7d6fddc5fe258d3328cd8e3d7d3e02234c5d70e01ebe377a6ab92adb14039cb4"} },
    {name = "numpy-2.0.1-cp312-cp312-macosx_14_0_arm64.whl", upload-time = 2024-07-21T13:37:46.601144Z, url = "https://files.pythonhosted.org/packages/c2/da/3d8debb409bc97045b559f408d2b8cefa6a077a73df14dbf4d8780d976b1/numpy-2.0.1-cp312-cp312-macosx_14_0_arm64.whl", size = 5037809, hashes = {sha256 = "5daab361be6ddeb299a918a7c0864fa8618af66019138263247af405018b04e1"} },
    {name = "numpy-2.0.1-cp312-cp312-macosx_14_0_x86_64.whl", upload-time = 2024-07-21T13:37:58.784393Z, url = "https://files.pythonhosted.org/packages/6d/59/85160bf5f4af6264a7c5149ab07be9c8db2b0eb064794f8a7bf6d/numpy-2.0.1-cp312-cp312-macosx_14_0_x86_64.whl", size = 6631813, hashes = {sha256 = "ea2326a4dca88e4a274ba3a4405eb6c6467d3ffbd8c7d38632502eaae3820587"} },
    {name = "numpy-2.0.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", upload-time = 2024-07-21T13:38:19.714559Z, url = "https://files.pythonhosted.org/packages/5e/e3/944b77e2742fece7da8dfba6f7ef7dccdd163d1a613f7027f4d5b/numpy-2.0.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", size = 13623742, hashes = {sha256 = "529af13c5f4b7a932fb0e1911d3a75da204eff023ee5e0e79c1751564221a5c8"} },
    {name = "numpy-2.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", upload-time = 2024-07-21T13:38:48.972569Z, url = "https://files.pythonhosted.org/packages/2c/f3/61eee37decb58e7cb29940f19a1464b8608f2cab8a8616aba75fd/numpy-2.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", size = 19242336, hashes = {sha256 = "6790654cb13eab303d8402354fabd47472b24635700f631f041bd0b65e37298a"} },
    {name = "numpy-2.0.1-cp312-cp312-musllinux_1_1_x86_64.whl", upload-time = 2024-07-21T13:39:19.213811Z, url = "https://files.pythonhosted.org/packages/77/b5/c74cc436114c1de5912cdb475145245f6e645a6a1a29b5d08c774/numpy-2.0.1-cp312-cp312-musllinux_1_1_x86_64.whl", size = 19637264, hashes = {sha256 = "cbab9fc9c391700e3e1287666dfd82d8666d10e69a6c4a09ab97574c0b7ee0a7"} },
    {name = "numpy-2.0.1-cp312-cp312-musllinux_1_2_aarch64.whl", upload-time = 2024-07-21T13:39:41.812321Z, url = "https://files.pythonhosted.org/packages/da/89/c8856e12e0b3f6af371ccb90d604600923b08050c58f0cd26eac9/numpy-2.0.1-cp312-cp312-musllinux_1_2_aarch64.whl", size = 14108911, hashes = {sha256 = "99d0d92a5e3613c33a5f01db206a33f8fdf3d71f2912b0de1739894668b7a93b"} },
    {name = "numpy-2.0.1-cp312-cp312-win32.whl", upload-time = 2024-07-21T13:39:52.932102Z, url = "https://files.pythonhosted.org/packages/15/96/310c6f6d146518479b0a6ee6eb92a537954ec3b1acfa2894d1347/numpy-2.0.1-cp312-cp312-win32.whl", size = 6171379, hashes = {sha256 = "173a00b9995f73b79eb0191129f2455f1e34c203f559dd118636858cc452a1bf"} },
    {name = "numpy-2.0.1-cp312-cp312-win_amd64.whl", upload-time = 2024-07-21T13:40:17.532627Z, url = "https://files.pythonhosted.org/packages/b5/59/f6ad378ad85ed9c2785f271b39c3e5b6412c66e810d2c60934c9f/numpy-2.0.1-cp312-cp312-win_amd64.whl", size = 16255757, hashes = {sha256 = "bb2124fdc6e62baae159ebcfa368708867eb56806804d005860b6007388df171"} },
]

Installation

The following outlines the steps to be taken to install from a lock file (while the requirements are prescriptive, the general steps and order are a suggestion):

  1. Check if the metadata version specified by lock-version is supported; an error or warning MUST be raised as appropriate.
  2. If requires-python is specified, check that the environment being installed for meets the requirement; an error MUST be raised if it is not met.
  3. If environments is specified, check that at least one of the environment marker expressions is satisfied; an error MUST be raised if no expression is satisfied.
  4. For each package listed in [[packages]]:
    1. If marker is specified, check if it is satisfied; if it isn’t, skip to the next package.
    2. If requires-python is specified, check if it is satisfied; an error MUST be raised if it isn’t.
    3. Check that no other instance of the package has been slated to be installed; an error about the ambiguity MUST be raised otherwise.
    4. Check that the source of the package is specified appropriately (i.e. there are not conflicting sources in the package entry); an error MUST be raised if any issues are found.
    5. Add the package to the set of packages to install.
  5. For each package to be installed:
    • If vcs is set:
      1. Clone the repository to the commit ID specified in commit-id.
      2. Build the package, respecting subdirectory.
      3. Install.
    • Else if directory is set:
      1. Build the package, respecting subdirectory.
      2. Install.
    • Else if archive is set:
      1. Get the file.
      2. Validate the file size and hash.
      3. Build the package, respecting subdirectory.
      4. Install.
    • Else if there are entries for wheels:
      1. Look for the appropriate wheel file based on name; if one is not found then move on to sdist or an error MUST be raised about a lack of source for the project.
      2. Get the file:
        • If path is set, use it.
        • If url is set, try to use it; optionally tools MAY use packages.index or some tool-specific mechanism to download the selected wheel file (tools MUST NOT try to change what wheel file to download based on what’s available; what file to install should be determined in an offline fashion for reproducibility).
      3. Validate the file size and hash.
      4. Install.
    • Else if no wheel file is found or sdist is solely set:
      1. Get the file.
        • If path is set, use it.
        • If url is set, try to use it; tools MAY use packages.index or some tool-specific mechanism to download the file.
      2. Validate the file size and hash.
      3. Build the package.
      4. Install.

Semantic differences with requirements.txt files

Ignoring formatting, there are a few differences between lock files as proposed by this PEP and those that are possible via a requirements file.

Some of the differences are in regards to security. Requiring hashes, recording file sizes, and where a file was found – both the index and the location of the file itself – help with auditing and validating the files that were locked against. Compare that with requirements files which can optionally include hashes, but it is an opt-in feature and can be bypassed. The optional inclusion of a file’s upload time and where the files can be found is also different.

Being explicit about the supported Python versions and environments for the file overall is also unique to this PEP. This is to alleviate the issue of not knowing when a requirements file targets a specific platform.

The [tool] tables don’t have a direct correlation in requirements files. They do support comments, but they are not inherently structured like the [tool] table is thanks to being in TOML.

While comments in a requirements file could record details that are helpful for auditing and understanding what the lock file contains, providing the structured support to record such things makes auditing easier. Recording the required Python version for a package upfront helps with this as well as erroring out sooner if an install is going to fail. Recording the wheel file name separate from the URL or path is also to help make reading the list of wheel files easier as it encodes information that can be useful when understanding and auditing a file. Recording the sdist file name is for the same reason.

Backwards Compatibility

Because there is no preexisting lock file format, there are no explicit backwards-compatibility concerns in terms of Python packaging standards.

As for packaging tools themselves, that will be a per-tool decision as to whether they choose to support this PEP and in what way (i.e. as an export target or as the primary way they record their lock file).

Security Implications

The hope is that by standardizing on a lock file format which starts from a security-first posture it will help make overall packaging installation safer. However, this PEP does not solve all potential security concerns.

One potential concern is tampering with a lock file. If a lock file is not kept in source control and properly audited, a bad actor could change the file in nefarious ways (e.g., point to a malware version of a package). Tampering could also occur in transit to e.g. a cloud provider who will perform an installation on the user’s behalf. Both could be mitigated by signing the lock file either within the file in a [tool] entry or via a side channel external to the lock file itself.

This PEP does not do anything to prevent a user from installing incorrect packages. While including many details to help in auditing a package’s inclusion, there isn’t any mechanism to stop e.g. name confusion attacks via typosquatting. Tools may be able to provide some UX to help with this (e.g. by providing download counts for a package).

How to Teach This

Users should be informed that when they ask to install some package, the package may have its own dependencies, those dependencies may have dependencies, and so on. Without writing down what gets installed as part of installing the package they requested, things could change from underneath them (e.g. package versions). Changes to the underlying dependencies can lead to accidental breakage of their code. Lock files help deal with that by providing a way to write down what was installed so you can install the exact same thing in the future.

Having what to install written down also helps in collaborating with others. By agreeing to a lock file’s contents, everyone ends up with the same packages installed. This helps make sure no one relies on e.g. an API that’s only available in a certain version that not everyone working on the project has installed.

Lock files also help with security by making sure you always get the same files installed and not a malicious one that someone may have slipped in. It also lets one be more deliberate in upgrading their dependencies and thus making sure the change is on purpose and not one slipped in by a bad actor.

Reference Implementation

A proof-of-concept implementing most of this PEP for various versions of this PEP can be found at https://github.com/brettcannon/mousebender/tree/pep . While the various implementations have not matched the exact format of this PEP, the general semantic requirements have been implemented before.

Prior to acceptance of this PEP, the PoC will be updated.

Rejected Ideas

Recording the dependency graph for installation purposes

A previous version of this PEP recorded the dependency graph of packages instead of a set of packages to install. The idea was that by recording the dependency graph you not only got more information, but it provided more flexibility by supporting more features innately (e.g. platform-specific dependencies without explicitly propagating markers).

In the end, though, it was deemed to add complexity that wasn’t worth the cost (e.g. it impacted the ease of auditing for details which were not necessary for this PEP to reach its goals).

Specifying a new core metadata version that requires consistent metadata across files

At one point, to handle the issue of metadata varying between files and thus require examining every released file for a package and version for accurate locking results, the idea was floated to introduce a new core metadata version which would require all metadata for all wheel files be the same for a single version of a package. Ultimately, though, it was deemed unnecessary as this PEP will put pressure on people to make files consistent for performance reasons or to make indexes provide all the metadata separate from the wheel files themselves. As well, there’s no easy enforcement mechanism, and so community expectation would work as well as a new metadata version.

Have the installer do dependency resolution

In order to support a format more akin to how Poetry worked when this PEP was drafted, it was suggested that lockers effectively record the packages and their versions which may be necessary to make an install work in any possible scenario, and then the installer resolves what to install. But that complicates auditing a lock file by requiring much more mental effort to know what packages may be installed in any given scenario. Also, one of the Poetry developers suggested that markers as represented in the package locking approach of this PEP may be sufficient to cover the needs of Poetry. Not having the installer do a resolution also simplifies their implementation, centralizing complexity in lockers.

Requiring minimum hash algorithm support

It was proposed to require a baseline hash algorithm for the files. This was rejected as no other Python packaging specification requires specific hash algorithm support. As well, the minimum hash algorithm suggested may eventually become an outdated/unsafe suggestion, requiring further updates. In order to promote using the best algorithm at all times, no baseline is provided to avoid simply defaulting to the baseline in tools without considering the security ramifications of that hash algorithm.

File naming

Using *.pylock.toml as the file name

It was proposed to put the pylock constant part of the file name after the identifier for the purpose of the lock file. It was decided not to do this so that lock files would sort together when looking at directory contents instead of purely based on their purpose which could spread them out in a directory.

Using *.pylock as the file name

Not using .toml as the file extension and instead making it .pylock itself was proposed. This was decided against so that code editors would know how to provide syntax highlighting to a lock file without having special knowledge about the file extension.

Not having a naming convention for the file

Having no requirements or guidance for a lock file’s name was considered, but ultimately rejected. By having a standardized naming convention it makes it easy to identify a lock file for both a human and a code editor. This helps facilitate discovery when e.g. a tool wants to know all of the lock files that are available.

File format

Use JSON over TOML

Since having a format that is machine-writable was a goal of this PEP, it was suggested to use JSON. But it was deemed less human-readable than TOML while not improving on the machine-writable aspect enough to warrant the change.

Use YAML over TOML

Some argued that YAML met the machine-writable/human-readable requirement in a better way than TOML. But as that’s subjective and pyproject.toml already existed as the human-writable file used by Python packaging standards it was deemed more important to keep using TOML.

Other keys

A single hash algorithm for the whole file

Earlier versions of this PEP proposed having a single hash algorithm be specified per file instead of any number of algorithms per file. The thinking was that by specifying a single algorithm it would help with auditing the file when a specific hash algorithm was mandated for use.

In the end there was some objection to this idea. Typically, it centered around the cost of rehashing large wheel files (e.g. PyTorch). There was also concern about making hashing decisions upfront on the installer’s behalf which they may disagree with. In the end it was deemed better to have flexibility and let people audit the lock file as they see fit.

Hashing the contents of the lock file itself

Hashing the contents of the bytes of the file and storing hash value within the file itself was proposed at some point. This was removed to make it easier when merging changes to the lock file as each merge would have to recalculate the hash value to avoid a merge conflict.

Hashing the semantic contents of the file was also proposed, but it would lead to the same merge conflict issue.

Regardless of which contents were hashed, either approach could have the hash value stored outside of the file if such a hash was desired.

Recording the creation date of the lock file

To know how potentially stale the lock file was, an earlier proposal suggested recording the creation date of the lock file. But for some same merge conflict reasons as storing the hash of the file contents, this idea was dropped.

Recording the package indexes used in searching

Recording what package indexes were used to create the lock file was considered. In the end, though, it was rejected as it was deemed unnecessary bookkeeping.

Locking build requirements for sdists

An earlier version of this PEP tried to lock the build requirements for sdists under a packages.build-requires key. Unfortunately, it confused enough people about how it was expected to operate and there were enough edge case issues to decide it wasn’t worth trying to do in this PEP upfront. Instead, a future PEP could propose a solution.

Recordinng possible extras and dependency groups

To expand the feature set of this PEP such that tools could use this PEP for their main lock file format, recording possible extras and dependency groups that a user may request was considered. The idea was that if you were locking from a pyproject.toml file then you would want to lock for all possibilities; that includes extras and dependency groups.

To make this work one would need a way to declare when a package applied to an extra and/or dependency group. Due to the possibility that a combination of extras and/or dependency groups could change version requirements of a package (e.g. extra A needed the spam package at any version, but extra B needed the spam package older than version 2, thus making whether extra B was requested override what extra A requested), it would require either full Boolean logic support or only locking to a specific version of a package no matter what extras and/or dependency groups were requested. The former would at least require coming up with semantics around a marker for dependency groups, and the latter would require a separate lock file any time one didn’t want a single version restriction.

In the end, there wasn’t enough interest from tools for using this PEP as their sole lock file format to warrant doing the work to implement this feature at this time.

Simplification

Drop recording the package version

The package version is optional since it can only be reliably recorded when an sdist of wheel file is used. And since both sources record the version in file names it is technically redundant.

But in discussions it was decided the version number is useful for auditing enough to still state it separately.

Drop the requirement to specify the location of an sdist and/or wheels

At least one person has commented how their work has unstable URLs for all sdists and wheels. As such, they have to search for all files at install regardless of where the file was found previously. Dropping the requirement to provide the URL or path to a file would have helped solve the issue of recording known-bad information.

The decision to allow tools to look for a file in other ways beyond the URL provided alleviated the need to make the URL optional.

Drop requiring file size and hashes

At least one person has said that their work modifies all wheels and sdists with internal files. That means any recorded hashes and file sizes will be wrong. By making the file size and hashes optional – very likely through some opt-out mechanism – then they could continue to produce lock files that meet this PEP’s requirements.

The decision was made that this weakens security too much. It also prevents installing files from alternative locations.

Drop recording the sdist file name

While incompatible with dropping the URL/path requirement, the package version, and hashes, recording the sdist file name is technically not necessary at all (right now recording the file name is optional). The file name only encodes the project name and version, so no new info is conveyed about the file (when the package version is provided). And if the location is recorded then getting the file is handled regardless of the file name.

But recording the file name can helpful when looking for an appropriate file when the recorded file location is no longer available (while sdist file names are now standardized thanks to PEP 625, that has only been true since 2020 and thus there are many older sdists with names that may not be guessable).

The decision was made to require the sdist file name out of simplicity.

Make packages.wheels a table

One could see writing out wheel file details as a table keyed on the file name. For example:

[[packages]]
name = "attrs"
version = "23.2.0"
requires-python = ">=3.7"
index = "https://pypi.org/simple/"

[packages.wheels]
"attrs-23.2.0-py3-none-any.whl" = {upload-time = 2023-12-31T06:30:30.772444Z, url = "https://files.pythonhosted.org/packages/e0/44/827b2a91a5816512fcaf3cc4ebc465ccd5d598c45cefa6703fcf4a79018f/attrs-23.2.0-py3-none-any.whl", size = 60752, hashes = {sha256 = "99b87a485a5820b23b879f04c2305b44b951b502fd64be915879d77a7e8fc6f1"}

[[packages]]
name = "numpy"
version = "2.0.1"
requires-python = ">=3.9"
index = "https://pypi.org/simple/"

[packages.wheels]
"numpy-2.0.1-cp312-cp312-macosx_10_9_x86_64.whl" = {upload-time = 2024-07-21T13:37:15.810939Z, url = "https://files.pythonhosted.org/packages/64/1c/401489a7e92c30db413362756c313b9353fb47565015986c55582593e2ae/numpy-2.0.1-cp312-cp312-macosx_10_9_x86_64.whl", size = 20965374, hashes = {sha256 = "6bf4e6f4a2a2e26655717a1983ef6324f2664d7011f6ef7482e8c0b3d51e82ac"}
"numpy-2.0.1-cp312-cp312-macosx_11_0_arm64.whl" = {upload-time = 2024-07-21T13:37:36.460324Z, url = "https://files.pythonhosted.org/packages/08/61/460fb524bb2d1a8bd4bbcb33d9b0971f9837fdedcfda8478d4c8f5cfd7ee/numpy-2.0.1-cp312-cp312-macosx_11_0_arm64.whl", size = 13102536, hashes = {sha256 = "7d6fddc5fe258d3328cd8e3d7d3e02234c5d70e01ebe377a6ab92adb14039cb4"}
"numpy-2.0.1-cp312-cp312-macosx_14_0_arm64.whl" = {upload-time = 2024-07-21T13:37:46.601144Z, url = "https://files.pythonhosted.org/packages/c2/da/3d8debb409bc97045b559f408d2b8cefa6a077a73df14dbf4d8780d976b1/numpy-2.0.1-cp312-cp312-macosx_14_0_arm64.whl", size = 5037809, hashes = {sha256 = "5daab361be6ddeb299a918a7c0864fa8618af66019138263247af405018b04e1"}
"numpy-2.0.1-cp312-cp312-macosx_14_0_x86_64.whl" = {upload-time = 2024-07-21T13:37:58.784393Z, url = "https://files.pythonhosted.org/packages/6d/59/85160bf5f4af6264a7c5149ab07be9c8db2b0eb064794f8a7bf6d/numpy-2.0.1-cp312-cp312-macosx_14_0_x86_64.whl", size = 6631813, hashes = {sha256 = "ea2326a4dca88e4a274ba3a4405eb6c6467d3ffbd8c7d38632502eaae3820587"}
"numpy-2.0.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl" = {upload-time = 2024-07-21T13:38:19.714559Z, url = "https://files.pythonhosted.org/packages/5e/e3/944b77e2742fece7da8dfba6f7ef7dccdd163d1a613f7027f4d5b/numpy-2.0.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", size = 13623742, hashes = {sha256 = "529af13c5f4b7a932fb0e1911d3a75da204eff023ee5e0e79c1751564221a5c8"}
"numpy-2.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl" = {upload-time = 2024-07-21T13:38:48.972569Z, url = "https://files.pythonhosted.org/packages/2c/f3/61eee37decb58e7cb29940f19a1464b8608f2cab8a8616aba75fd/numpy-2.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", size = 19242336, hashes = {sha256 = "6790654cb13eab303d8402354fabd47472b24635700f631f041bd0b65e37298a"}
"numpy-2.0.1-cp312-cp312-musllinux_1_1_x86_64.whl" = {upload-time = 2024-07-21T13:39:19.213811Z, url = "https://files.pythonhosted.org/packages/77/b5/c74cc436114c1de5912cdb475145245f6e645a6a1a29b5d08c774/numpy-2.0.1-cp312-cp312-musllinux_1_1_x86_64.whl", size = 19637264, hashes = {sha256 = "cbab9fc9c391700e3e1287666dfd82d8666d10e69a6c4a09ab97574c0b7ee0a7"}
"numpy-2.0.1-cp312-cp312-musllinux_1_2_aarch64.whl" = {upload-time = 2024-07-21T13:39:41.812321Z, url = "https://files.pythonhosted.org/packages/da/89/c8856e12e0b3f6af371ccb90d604600923b08050c58f0cd26eac9/numpy-2.0.1-cp312-cp312-musllinux_1_2_aarch64.whl", size = 14108911, hashes = {sha256 = "99d0d92a5e3613c33a5f01db206a33f8fdf3d71f2912b0de1739894668b7a93b"}
"numpy-2.0.1-cp312-cp312-win32.whl" = {upload-time = 2024-07-21T13:39:52.932102Z, url = "https://files.pythonhosted.org/packages/15/96/310c6f6d146518479b0a6ee6eb92a537954ec3b1acfa2894d1347/numpy-2.0.1-cp312-cp312-win32.whl", size = 6171379, hashes = {sha256 = "173a00b9995f73b79eb0191129f2455f1e34c203f559dd118636858cc452a1bf"}
"numpy-2.0.1-cp312-cp312-win_amd64.whl" = {upload-time = 2024-07-21T13:40:17.532627Z, url = "https://files.pythonhosted.org/packages/b5/59/f6ad378ad85ed9c2785f271b39c3e5b6412c66e810d2c60934c9f/numpy-2.0.1-cp312-cp312-win_amd64.whl", size = 16255757, hashes = {sha256 = "bb2124fdc6e62baae159ebcfa368708867eb56806804d005860b6007388df171"}

In general, though, people did not prefer this over the approach this PEP has taken.

Self-Referential

Drop the [tool] table

The [tool] table is included as it has been found to be very useful for pyproject.toml files. Providing similar flexibility to this PEP is done in hopes that similar benefits will materialize.

But some people have been concerned that such a table will be too enticing to tools and will lead to files that are tool-specific and unusable by other tools. This could cause issues for tools trying to do installation, auditing, etc. as they would not know what details in the [tool] table are somehow critical.

As a compromise, this PEP specifies that the details recorded in [tool] must be disposable and not affect installation of packages.

List the requirement inputs for the file

Right now the file does not record the requirements that acted as inputs to the file. This is for simplicity reasons and to not explicitly constrain the file in some unforeseen way (e.g., updating the file after initial creation for a new platform that has different requirements, all without having to resolve how to write a comprehensive set of requirements).

But it may help in auditing and any recreation of the file if the original requirements were somehow recorded. This could be a single string or an array of strings if multiple requirements were used with the file.

In the end it was deemed too complicated to try and capture the inputs that a tool used to construct the lock file in a generic fashion.

Auditing

Recording dependents

Recording the dependents of a package is not necessary to install it. As such, it has been left out of the PEP as it can be included via [tool].

But knowing how critical a package is to other packages may be beneficial. This information is included by pip-tools , so there’s prior art in including it. A flexible approach could be used to record the dependents, e.g. as much detail as to differentiate from any other entry for the same package in the file (inspired by uv).

In the end, though, it was decided that recording the dependencies is a better thing to record.

Acknowledgements

Thanks to everyone who participated in the discussions on discuss.python.org. Also thanks to Randy Döring, Seth Michael Larson, Paul Moore, and Ofek Lev for providing feedback on a draft version of this PEP before going public.


Source: https://github.com/python/peps/blob/main/peps/pep-0751.rst

Last modified: 2025-02-07 00:44:16 GMT