PEP 751 – A file format to record Python dependencies for installation reproducibility
- Author:
- Brett Cannon <brett at python.org>
- Status:
- Draft
- Type:
- Standards Track
- Topic:
- Packaging
- Created:
- 24-Jul-2024
- Post-History:
- 25-Jul-2024 30-Oct-2024
- Replaces:
- 665
Table of Contents
- Abstract
- Motivation
- Rationale
- Specification
- File Name
- File Format
- Examples
- Expectations for Lockers
- Expectations for Installers
- Backwards Compatibility
- Security Implications
- How to Teach This
- Reference Implementation
- Rejected Ideas
- Open Issues
- Deferred Ideas
- Acknowledgements
- Copyright
Abstract
This PEP proposes a new file format for dependency specification to enable reproducible installation in a Python environment. The format is designed to be human-readable and machine-generated. Installers consuming the file should be able to calculate what to install without the need for dependency resolution at install-time.
Motivation
Currently, no standard exists to create an immutable record, such as a lock file, which specifies what direct and indirect dependencies should be installed into a virtual environment.
Considering there are at least five well-known solutions to this problem in the
community (pip freeze
, pip-tools, uv, Poetry, and PDM), there seems to
be an appetite for lock files in general.
Those tools also vary in what locking scenarios they support. For instance,
pip freeze
and pip-tools only generate lock files for the current
environment while PDM and Poetry try to lock for any environment to some
degree. There’s also concerns around the lack of secure defaults in the face of
supply chain attacks (e.g., always including hashes for files).
The lack of a standard also has some drawbacks. For instance, any tooling that wants to work with lock files must choose which format to support, potentially leaving users unsupported (e.g., Dependabot only supporting select tools, same for cloud providers who can do dependency installations on your behalf, etc.). It also impacts portability between tools, which causes vendor lock-in. By not having compatibility and interoperability it fractures tooling around lock files where both users and tools have to choose what lock file format to use upfront and making it costly to use/switch to other formats. Rallying around a single format removes that cost/barrier.
Note
Much of the motivation from PEP 665 also applies to this PEP.
Rationale
The format is designed so that a locker which produces the lock file and an installer which consumes the lock file can be separate tools. This allows for situations such as cloud hosting providers to use their own installer that’s optimized for their system which is independent of what locker the user used to create their lock file.
The file format is designed to be human-readable. This is so that the contents of the file can be audited by a human to make sure no undesired dependencies end up being included in the lock file.
The file format is also designed to not require a resolver at install time. This greatly simplifies installers and thus reasoning about what would be installed when consuming a lock file. It should also lead to faster installs which are much more frequent than creating a lock file.
Finally, the lock file is meant to be flexible enough to meets the various needs tools have for choosing what to install. That means the lock file records the dependency graph of what _may_ be installed. This allows tools to enter the graph at any point and still have reproducible results from that root of the graph. Flexibility also means supporting different installation scenarios within the same lock file (e.g., with or without test dependencies).
Specification
File Name
A lock file MUST be named pylock.toml
. The use of the .toml
file
extension is to make syntax highlighting in editors easier and to reinforce the
fact that the file format is meant to be human-readable.
The lock file SHOULD be located in the directory as appropriate for the scope of
the lock file. Locking against a single pyproject.toml
, for instance, would
place the pylock.toml
in the same directory. If the lock file covered
multiple projects in a monorepo, then the expectation is the pylock.toml
file would be in the directory that held all the projects being locked.
File Format
The format of the file is TOML.
All keys listed below are required unless otherwise noted. If two keys are mutually exclusive to one another, then one of the keys is required while the other is disallowed.
Keys in tables – including the top-level table – SHOULD be emitted by lockers in the order they are listed in this PEP when applicable unless another sort order is specified to minimize noise in diffs. If the keys are not explicitly specified in this PEP, then the keys SHOULD be sorted by lexicographic order.
As well, lockers SHOULD sort arrays in lexicographic order unless otherwise specified for the same reason.
version
- String
- The version of the lock file format.
- This PEP specifies the initial version – and only valid value until future
updates to the standard change it – as
"1.0"
. - If an installer supports the major version but not the minor version, a tool SHOULD warn when an unknown key is seen.
- If an installer doesn’t support a major version, it MUST raise an error.
hash-algorithm
- String
- The name of the hash algorithm used for calculating all hash values.
- Only a single hash algorithm is used for the entire file to allow hash values to be written in inline tables for readability and compactness purposes by only listing a single hash value instead of multiple values based on multiple hash algorithms.
- Specifying a single hash algorithm guarantees that an algorithm that the user prefers is used consistently throughout the file without having to audit each file hash value separately.
- Allows for updating the entire file to a new hash algorithm without running the risk of accidentally leaving an old hash value in the file.
- JSON-based Simple API for Python Package Indexes and the
hashes
dictionary of of thefiles
dictionary of the Project Details dictionary specifies what values are valid and guidelines on what hash algorithms to use. - Failure to validate any hash values for any file that is to be installed MUST raise an error.
[locker]
- Table
- Record of the tool that generated the lock file.
- Enough details SHOULD be provided such that the lock file from the details in this table can be reproduced (provided the same I/O data is available, e.g., Dependabot if only files from a repository is necessary to run the command).
locker.name
- String
- The name of the tool used to create the lock file.
- If the locker is a Python project, its normalized name SHOULD be used.
locker.version
- String
- The version of the tool used.
locker.run
- Optional
- Inline table
- Records the command used to create the lock file.
locker.run.module
- Optional
- String
- The module name used for running the locker (i.e. what would be passed to
python -m
). - Lockers MUST specify this key if the locker can be executed via
python -m
.
locker.run.args
- Optional
- Array of strings
- If the locker has a CLI, the arguments to pass to the locker.
- All paths MUST be relative to the lock file so that another tool could use the lock file’s location as the current working directory.
[[groups]]
- Array of tables
- A named subset of packages as found in
[[packages]]
. - Act as roots into the dependency graph.
- Installers MUST allow the user to select one or more groups by name to install all relevant packages together.
- Installers SHOULD let the user skip specifying a name if there is only one entry in the array.
groups.name
- String
- The name of the group.
groups.project
- Mutually-exclusive with
requirements
- String
- The normalized name of a package to act as the starting point into the dependency graph.
- Analogous to locking to the
[project]
table inpyproject.toml
. - Installers MUST let a user specify any optional features/extras that the package provides.
- Lockers MUST NOT allow for ambiguity by specifying multiple package versions
of the same package under the same group name when a package is listed in any
project
key.
groups.requirements
- Mutually-exclusive with
project
- Array of tables
- Represents the installation requirements for this group.
- Analogous to a key in
[dependency-groups]
inpyproject.toml
. - Lockers MUST make sure that resolving any requirement for any environment does
not lead to ambiguity by having multiple values in
[[packages]]
match the same requirement. - Values in the array SHOULD be written as inline tables, sorted
lexicographically by
name
, then byfeature
with the lack of that key sorting first.
groups.requirements.name
- String
- Normalized name of the package.
groups.requirements.extras
- Optional
- Array of strings
- The names of the extras specified for the requirement
(i.e. what comes between
[...]
).
groups.requirements.version
- Optional
- String
- The version specifiers for the requirement.
groups.requirements.marker
- Optional
- String
- The environment markers for the requirement.
[[packages]]
- Array of tables
- The array contains all data on the nodes of the dependency graph.
- Lockers SHOULD record packages in order by
name
lexicographically,version
by its Python version specifiers ordering, and then bygroups
following Python’s sort order for lists of strings (i.e. item by item, then by length as a tiebreaker).
packages.name
- String
- The normalized name of the package.
packages.version
- String
- The version of the package.
packages.groups
- Array of strings
- Associates this table with the
group.name
entries of the same names.
packages.index-url
- Optional
- String
- Stores the project index URL from the Simple Repository API.
- Useful for generating Packaging URLs (aka PURLs).
- When possible, lockers SHOULD include this to assist with generating software bill of materials (aka SBOMs).
packages.direct
- Optional (defaults to
false
) - Boolean
- Represents whether the installation is via a direct URL reference.
packages.requires-python
- String
- Holds the version specifiers for Python version compatibility for the package and version.
- The value MUST match what’s provided by the package version, if available, via Requires-Python.
[[packages.dependencies]]
- Array of tables
- A record of the dependency requirements of the package and version.
- The values MUST semantically match what’s provided by the package version via Requires-Dist (multiple use) for all dependencies referenced in the lock file (i.e all base dependencies plus all dependencies for extras referenced in the lock file); lock files MAY list all dependencies for unused extras if desired.
- Values in the array SHOULD be written as inline tables, sorted
lexicographically by
name
, then byfeature
with the lack of that key sorting first.
packages.dependencies.name
See groups.requirements.name
.
packages.dependencies.extras
See groups.requirements.extras
.
packages.dependencies.version
See groups.requirements.version
.
packages.dependencies.marker
See groups.requirements.marker
.
packages.dependencies.feature
- Optional
- String
- The optional feature/Provides-Extra (multiple use) that this requirement is conditional on.
packages.editable
- Optional (defaults to
false
) - Boolean
- Specifies whether the package should be installed in editable mode.
[packages.source-tree]
- Optional
- Table
- For recording where to find the source tree for the package version.
- Lockers SHOULD write this table inline.
- Support for source trees by installers is optional.
- If support is provided by an installer it SHOULD be opt-in.
- If multiple source trees are provided, installers MUST prefer either the
vcs
option or a file for security/reproducibility due to their commit or hash, respectively.
packages.source-tree.vcs
- Optional
- String
- If specifying a VCS, the type of version control system used.
- The valid values are specified by the registered VCSs of the direct URL data structure.
packages.source-tree.path
- Required if
url
is not set - String
- A path to the source tree, which may be absolute or relative.
- If the path is relative it MUST be relative to the lock file.
- The path may either be to a directory, file archive, or VCS checkout if
vcs
if is specified.
packages.source-tree.url
- Required if
path
is not set - String
- A URL to a file archive containing the source tree, or a VCS checkout if
vcs
is specified.
packages.source-tree.commit
- Required if
vcs
is set - String
- The commit ID for the repository which represents the package and version.
- The value MUST be immutable for the VCS for security purposes (e.g. no Git tags).
packages.source-tree.size
- Optional
- Integer
- The size in bytes for the source tree if it is a file.
- Installers MUST verify the file size matches this value.
packages.source-tree.hash
- Required if
url
orpath
points to a file - String
- The hash value of the file contents using the hash algorithm specified by
hash-algorithm
. - Installers MUST verify the hash matches the file.
[packages.sdist]
- Optional
- Table
- The location of a source distribution as specified by Source distribution format.
- Lockers SHOULD write the table inline.
- Support for source distributions by installers is optional.
- If support is provided by an installer it SHOULD be opt-in.
packages.sdist.url
- Optional; mutually-exclusive with
path
- String
- The URL to the file.
packages.sdist.path
- Optional; mutually-exclusive with
url
- String
- A path to the file, which may be absolute or relative.
- If the path is relative it MUST be relative to the lock file.
packages.sdist.upload-time
- Optional and only applicable when
url
is specified - Offset date time
- The upload date and time of the file as specified by a valid ISO 8601
date/time string for the
.files[]."upload-time"
field in the JSON version of Simple repository API.
packages.sdist.size
- Optional
- Integer
- The size of the file in bytes.
- Installers MUST verify the file size matches this value.
packages.sdist.hash
- String
- The hash value of the file contents using the hash algorithm specified by
hash-algorithm
. - Installers MUST verify the hash matches the file.
[[packages.wheels]]
- Optional
- Array of tables
- For recording the wheel files as specified by Binary distribution format for the package version.
- Lockers SHOULD write the table inline.
- Lockers SHOULD sort the array values lexicographically by
tag
.
packages.wheels.build
- Optional
- String
- The build tag for the wheel file (if appropriate).
packages.wheels.url
See packages.sdist.url
.
packages.wheels.path
See packages.sdist.path
.
packages.wheels.upload-time
See packages.sdist.upload-time
.
packages.wheels.size
See packages.sdist.size
.
packages.wheels.hash
See packages.sdist.hash
.
[packages.tool]
- Optional
- Table
- Similar usage as that of the
[tool]
table from the pyproject.toml specification , but at the package version level instead of at the lock file level (which is also available via[tool]
). - Useful for scoping package version/release details (e.g., recording signing identities to then use to verify package integrity separately from where the package is hosted, prototyping future extensions to this file format, etc.).
[tool]
- Optional
- Table
- Same usage as that of the equivalent
[tool]
table from the pyproject.toml specification.
Examples
version = '1.0'
hash-algorithm = 'sha256'
[locker]
name = 'mousebender'
version = 'pep'
run = { module = 'mousebender', args = ['lock', '--platform', 'cpython3.12-manylinux2014-x64', '--platform', 'cpython3.12-windows-x64', 'cattrs', 'numpy'] }
[[groups]]
name = 'Default'
requirements = [
{ name = 'cattrs' },
{ name = 'numpy' },
]
[[packages]]
name = 'attrs'
version = '24.2.0'
groups = ['Default']
index_url = 'https://pypi.org/simple/attrs'
direct = false
requires_python = '>=3.7'
dependencies = [
{ name = 'importlib-metadata', marker = 'python_version < "3.8"' },
{ name = 'cloudpickle', marker = 'platform_python_implementation == "CPython"', feature = 'benchmark' },
{ name = 'hypothesis', feature = 'benchmark' },
{ name = 'mypy', version = '>=1.11.1', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9"', feature = 'benchmark' },
{ name = 'pympler', feature = 'benchmark' },
{ name = 'pytest-codspeed', feature = 'benchmark' },
{ name = 'pytest-mypy-plugins', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9" and python_version < "3.13"', feature = 'benchmark' },
{ name = 'pytest-xdist', extras = ['psutil'], feature = 'benchmark' },
{ name = 'pytest', version = '>=4.3.0', feature = 'benchmark' },
{ name = 'cloudpickle', marker = 'platform_python_implementation == "CPython"', feature = 'cov' },
{ name = 'coverage', extras = ['toml'], version = '>=5.3', feature = 'cov' },
{ name = 'hypothesis', feature = 'cov' },
{ name = 'mypy', version = '>=1.11.1', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9"', feature = 'cov' },
{ name = 'pympler', feature = 'cov' },
{ name = 'pytest-mypy-plugins', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9" and python_version < "3.13"', feature = 'cov' },
{ name = 'pytest-xdist', extras = ['psutil'], feature = 'cov' },
{ name = 'pytest', version = '>=4.3.0', feature = 'cov' },
{ name = 'cloudpickle', marker = 'platform_python_implementation == "CPython"', feature = 'dev' },
{ name = 'hypothesis', feature = 'dev' },
{ name = 'mypy', version = '>=1.11.1', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9"', feature = 'dev' },
{ name = 'pre-commit', feature = 'dev' },
{ name = 'pympler', feature = 'dev' },
{ name = 'pytest-mypy-plugins', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9" and python_version < "3.13"', feature = 'dev' },
{ name = 'pytest-xdist', extras = ['psutil'], feature = 'dev' },
{ name = 'pytest', version = '>=4.3.0', feature = 'dev' },
{ name = 'cogapp', feature = 'docs' },
{ name = 'furo', feature = 'docs' },
{ name = 'myst-parser', feature = 'docs' },
{ name = 'sphinx', feature = 'docs' },
{ name = 'sphinx-notfound-page', feature = 'docs' },
{ name = 'sphinxcontrib-towncrier', feature = 'docs' },
{ name = 'towncrier', version = '<24.7', feature = 'docs' },
{ name = 'cloudpickle', marker = 'platform_python_implementation == "CPython"', feature = 'tests' },
{ name = 'hypothesis', feature = 'tests' },
{ name = 'mypy', version = '>=1.11.1', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9"', feature = 'tests' },
{ name = 'pympler', feature = 'tests' },
{ name = 'pytest-mypy-plugins', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9" and python_version < "3.13"', feature = 'tests' },
{ name = 'pytest-xdist', extras = ['psutil'], feature = 'tests' },
{ name = 'pytest', version = '>=4.3.0', feature = 'tests' },
{ name = 'mypy', version = '>=1.11.1', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9"', feature = 'tests-mypy' },
{ name = 'pytest-mypy-plugins', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9" and python_version < "3.13"', feature = 'tests-mypy' }
]
editable = false
wheels = [
{ tags = ['py3-none-any'], url = 'https://files.pythonhosted.org/packages/6a/21/5b6702a7f963e95456c0de2d495f67bf5fd62840ac655dc451586d23d39a/attrs-24.2.0-py3-none-any.whl', hash = '81921eb96de3191c8258c199618104dd27ac608d9366f5e35d011eae1867ede2', upload_time = 2024-08-06T14:37:36.958006+00:00, size = 63001 }
]
[[packages]]
name = 'cattrs'
version = '24.1.2'
groups = ['Default']
index_url = 'https://pypi.org/simple/cattrs'
direct = false
requires_python = '>=3.8'
dependencies = [
{ name = 'attrs', version = '>=23.1.0' },
{ name = 'exceptiongroup', version = '>=1.1.1', marker = 'python_version < "3.11"' },
{ name = 'typing-extensions', version = '!=4.6.3,>=4.1.0', marker = 'python_version < "3.11"' },
{ name = 'pymongo', version = '>=4.4.0', feature = 'bson' },
{ name = 'cbor2', version = '>=5.4.6', feature = 'cbor2' },
{ name = 'msgpack', version = '>=1.0.5', feature = 'msgpack' },
{ name = 'msgspec', version = '>=0.18.5', marker = 'implementation_name == "cpython"', feature = 'msgspec' },
{ name = 'orjson', version = '>=3.9.2', marker = 'implementation_name == "cpython"', feature = 'orjson' },
{ name = 'pyyaml', version = '>=6.0', feature = 'pyyaml' },
{ name = 'tomlkit', version = '>=0.11.8', feature = 'tomlkit' },
{ name = 'ujson', version = '>=5.7.0', feature = 'ujson' }
]
editable = false
wheels = [
{ tags = ['py3-none-any'], url = 'https://files.pythonhosted.org/packages/c8/d5/867e75361fc45f6de75fe277dd085627a9db5ebb511a87f27dc1396b5351/cattrs-24.1.2-py3-none-any.whl', hash = '67c7495b760168d931a10233f979b28dc04daf853b30752246f4f8471c6d68d0', upload_time = 2024-09-22T14:58:34.812643+00:00, size = 66446 }
]
[[packages]]
name = 'numpy'
version = '2.1.2'
groups = ['Default']
index_url = 'https://pypi.org/simple/numpy'
direct = false
requires_python = '>=3.10'
dependencies = [
]
editable = false
wheels = [
{ tags = ['cp312-cp312-manylinux2014_x86_64', 'cp312-cp312-manylinux_2_17_x86_64'], url = 'https://files.pythonhosted.org/packages/9b/b4/e3c7e6fab0f77fff6194afa173d1f2342073d91b1d3b4b30b17c3fb4407a/numpy-2.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl', hash = '6d95f286b8244b3649b477ac066c6906fbb2905f8ac19b170e2175d3d799f4df', upload_time = 2024-10-05T18:36:20.729642+00:00, size = 16041825 },
{ tags = ['cp312-cp312-win_amd64'], url = 'https://files.pythonhosted.org/packages/4c/79/73735a6a5dad6059c085f240a4e74c9270feccd2bc66e4d31b5ca01d329c/numpy-2.1.2-cp312-cp312-win_amd64.whl', hash = '456e3b11cb79ac9946c822a56346ec80275eaf2950314b249b512896c0d2505e', upload_time = 2024-10-05T18:37:38.159022+00:00, size = 12568254 }
]
Expectations for Lockers
- Lockers MUST make sure that entering the dependency graph via a specific group
will not lead to ambiguity for installers as to which value in
[[packages]]
to install for any environment (this can be controlled for viapackages.version
andpackages.groups
). - Lockers SHOULD try to make all logically related groups resolve together (i.e. no ambiguity if grouped together).
- If a
groups.project
would have extras that cause ambiguity or installation failure due to conflicts between the extras, the locker MAY create separategroups.requirements
entries instead, otherwise the locker MUST raise an error. - Lockers MAY try to lock for multiple environments in a single lock file.
- Lockers MAY try to update a lock file containing
[tool]
and[packages.tool]
for other tools than themselves. - Lockers MAY want to provide a way to let users provide the information necessary to lock for other environments, e.g., supporting a JSON file format which specifies wheel tags and marker values.
{
"marker-values": {"<marker>": "<value>"},
"wheel-tags": ["<tag>"]
}
Expectations for Installers
- Installers MAY support installation of non-binary files (i.e. source trees and source distributions), but are not required to.
- Installers MUST provide a way to avoid non-binary file installation for reproducibility and security purposes.
- Installers SHOULD make it opt-in to use non-binary file installation to facilitate a secure-by-default approach.
- If a traversal of the graph leads to any ambiguity as to what package version to install (i.e. more than one package version qualifies), an error MUST be raised.
- Installers MUST only consider package versions included in any selected groups (i.e. installers cannot consider packages outside of the groups selected to install from).
- Installers MUST error out if a package version lacks a way to install into the chosen environment.
- Installers MUST support installing into an empty environment.
Pseudo-Code
class UnsatisfiableError(Exception):
"""Raised when a requirement cannot be satisfied."""
class AmbiguityError(Exception):
"""Raised when a requirement has multiple solutions."""
def install_packages(lock_file_contents):
# Hard-coded out of laziness.
packages = choose_packages(lock_file_contents, (GROUP_NAME, frozenset()))
for package in packages:
tags = list(packaging.tags.sys_tags())
for tag in tags: # Prioritize by tag order.
tag_str = str(tag)
for wheel in package["wheels"]:
if tag_str in wheel["tags"]:
break
else:
continue
break
else:
raise UnsatisfiableError(
f"No wheel for {package['name']} {package['version']}"
)
print(f"Installing {package['name']} {package['version']} ({tag_str})")
def choose_packages(lock_file_data, *selected_groups):
"""Select the package versions that should be installed based on the requested groups.
'selected_groups' is a sequence of two-item tuples, representing a group name and
optionally any requested extras if the group is a project.
"""
group_names = frozenset(operator.itemgetter(0)(group) for group in selected_groups)
available_packages = {} # The packages in the selected groups.
for pkg in lock_file_data["packages"]:
if frozenset(pkg["groups"]) & group_names:
available_packages.setdefault(pkg["name"], []).append(pkg)
selected_packages = {} # The package versions that have been selected.
handled_extras = {} # The extras that have been handled.
requirements = [] # A stack of requirements to satisfy.
# First, get our starting list of requirements.
for group in selected_groups:
requirements.extend(gather_requirements(lock_file_data, group))
# Next, go through the requirements and try to find a **single** package version
# that satisfies each requirement.
while requirements:
req = requirements.pop()
# Ignore requirements whose markers disqualify it.
if not applies_to_env(req):
continue
name = req["name"]
if pkg := selected_packages.get(name):
# Safety check that the cross-section of groups doesn't cause issues.
# It somewhat assumes the locker didn't mess up such that there would be
# ambiguity by what package version was initially selected.
if not version_satisfies(req, pkg):
raise UnsatisfiableError(
f"requirement {req!r} not satisfied by "
f"{selected_packages[req['name']]!r}"
)
if "extras" not in req:
continue
needed_extras = req["extras"]
if not (extras := handled_extras.set_default(name, set())).difference(
needed_extras
):
continue
# This isn't optimal as we may tread over the same extras multiple times,
# but eventually the maximum set of extras for the package will be handled
# and thus the above guard will short-circuit adding any more requirements.
extras.update(needed_extras)
else:
# Raises UnsatisfiableError or AmbiguityError if no suitable, single package
# version is found.
pkg = compatible_package_version(req, available_packages[req["name"]])
selected_packages[name] = pkg
requirements.extend(dependencies(pkg, req))
return selected_packages.values()
def gather_requirements(locked_file_data, group):
"""Return a collection of all requirements for a group."""
# Hard-coded to support `groups.requirements` out of laziness.
group_name, _extras = group
for group in locked_file_data["groups"]:
if group["name"] == group_name:
return group["requirements"]
else:
raise ValueError(f"Group {group_name!r} not found in lock file")
def applies_to_env(requirement):
"""Check if the requirement applies to the current environment."""
try:
markers = requirement["marker"]
except KeyError:
return True
else:
return packaging.markers.Marker(markers).evaluate()
def version_satisfies(requirement, package):
"""Check if the package version satisfies the requirement."""
try:
raw_specifier = requirement["version"]
except KeyError:
return True
else:
specifier = packaging.specifiers.SpecifierSet(raw_specifier)
return specifier.contains(package["version"], prereleases=True)
def compatible_package_version(requirement, available_packages):
"""Return the package version that satisfies the requirement.
If no package version can satisfy the requirement, raise UnsatisfiableError. If
multiple package versions can satisfy the requirement, raise AmbiguityError.
"""
possible_packages = [
pkg for pkg in available_packages if version_satisfies(requirement, pkg)
]
if not possible_packages:
raise UnsatisfiableError(f"No package version satisfies {requirement!r}")
elif len(possible_packages) > 1:
raise AmbiguityError(f"Multiple package versions satisfy {requirement!r}")
return possible_packages[0]
def dependencies(package, requirement):
"""Return the dependencies of the package.
The extras from the requirement will extend the base requirements as needed.
"""
applicable_deps = []
extras = frozenset(requirement.get("extras", []))
for dep in package["dependencies"]:
if "feature" not in dep or dep["feature"] in extras:
applicable_deps.append(dep)
return applicable_deps
Backwards Compatibility
Because there is no preexisting lock file format, there are no explicit backwards-compatibility concerns in terms of Python packaging standards.
As for packaging tools themselves, that will be a per-tool decision. For tools that don’t document their lock file format, they could choose to simply start using the format internally and then transition to saving their lock files with a name supported by this PEP. For tools with a preexisting, documented format, they could provide an option to choose which format to emit.
Security Implications
The hope is that by standardizing on a lock file format that starts from a security-first posture it will help make overall packaging installation safer. However, this PEP does not solve all potential security concerns.
One potential concern is tampering with a lock file. If a lock file is not kept
in source control and properly audited, a bad actor could change the file in
nefarious ways (e.g. point to a malware version of a package). Tampering could
also occur in transit to e.g. a cloud provider who will perform an installation
on the user’s behalf. Both could be mitigated by signing the lock file either
within the file in a [tool]
entry or via a side channel external to the lock
file itself.
This PEP does not do anything to prevent a user from installing an incorrect packages. While including many details to help in auditing a package’s inclusion, there isn’t any mechanism to stop e.g. name confusion attacks via typosquatting. Lockers may be able to provide some UX to help with this (e.g. by providing download counts for a package).
How to Teach This
Users should be informed that when they ask to install some package, that package may have its own dependencies, those dependencies may have dependencies, and so on. Without writing down what gets installed as part of installing the package they requested, things could change from underneath them (e.g., package versions). Changes to the underlying dependencies can lead to accidental breakage of their code. Lock files help deal with that by providing a way to write down what was (and should be) installed.
Having what to install written down also helps in collaborating with others. By agreeing to a lock file’s contents, everyone ends up with the same packages installed. This helps make sure no one relies on e.g. an API that’s only available in a certain version that not everyone working on the project has installed.
Lock files also help with security by making sure you always get the same files installed and not a malicious one that someone may have slipped in. It also lets one be more deliberate in upgrading their dependencies and thus making sure the change is on purpose and not one slipped in by a bad actor.
Reference Implementation
A proof-of-concept implementing most of this PEP for wheels can be found at https://github.com/brettcannon/mousebender/tree/pep .
Rejected Ideas
A flat set of packages to install
An earlier version of this PEP proposed to use a flat set of package versions instead of a graph. The idea was that each package version could be evaluated in isolation as to whether it applied to an environment for installation. The hope was that would lend itself to easier auditing as one wouldn’t have to worry about how a package version fit into the graph when looking at e.g., a diff for a lock file.
Unfortunately this was deemed not as flexible as using a graph. For instance, recording the graph assists in dependency analysis for tools like GitHub. A graph also makes following how you ended up with dependencies within your lock file from any point in the graph. It also balances out the implementation costs a bit more between lockers and installers by alleviating the complexity off of lockers a bit for only a minor increase in complexity for installers by involving standard graph-traversing algorithms instead of a linear walk.
And if the dependency graph is already being recorded for the above benefits, then recording that same data in a flattened manner is redundant that makes lock files larger and potentially more unruly.
Specifying a new core metadata version that requires consistent metadata across files
At one point, to handle the issue of metadata varying between files and thus require examining every released file for a package and version for accurate locking results, the idea was floated to introduce a new core metadata version which would require all metadata for all wheel files be the same for a single version of a packages. Ultimately, though, it was deemed unnecessary as this PEP will put pressure on people to make files consistent for performance reasons or to make indexes provide all the metadata separate from the wheel files themselves. As well, there’s no easy enforcement mechanism, and so community expectation would work as well as a new metadata version.
Have the installer do dependency resolution
In order to support a format more akin to how Poetry worked when this PEP was drafted, it was suggested that lockers effectively record the packages and their versions which may be necessary to make an install work in any possible scenario, and then the installer resolves what to install. But that complicates auditing a lock file by requiring much more mental effort to know what packages may be installed in any given scenario. Also, one of the Poetry developers suggested that markers as represented in the package locking approach of this PEP may be sufficient to cover the needs of Poetry. Not having the installer do a resolution also simplifies their implementation, centralizing complexity in lockers.
Requiring specific hash algorithm support
It was proposed to require a baseline hash algorithm for the files. This was rejected as no other Python packaging specification requires specific hash algorithm support. As well, the minimum hash algorithm suggested may eventually become an outdated/unsafe suggestion, requiring further updates. In order to promote using the best algorithm at all times, no baseline is provided to avoid simply defaulting to the baseline in tools without considering the security ramifications of that hash algorithm.
Require a URL or file path for files
Originally references to files were required, e.g., packages.sdist.url
or
packages.sdist.path
. But at least
one use-case
surfaced during discussions about this PEP where statically specifying the
location of files would be problematic. And in earlier discussions the idea of
the location being a hint wasn’t preferred. Hence the PEP now makes the data
optional, but considers the locations accurate if specified.
File naming
Using *.pylock.toml
as the file name
It was proposed to put the pylock
constant part of the file name after the
identifier for the purpose of the lock file. It was decided not to do this so
that lock files would sort together when looking at directory contents instead
of purely based on their purpose which could spread them out in a directory.
Using *.pylock
as the file name
Not using .toml
as the file extension and instead making it .pylock
itself was proposed. This was decided against so that code editors would know
how to provide syntax highlighting to a lock file without having special
knowledge about the file extension.
Not having a naming convention for the file
Having no requirements or guidance for a lock file’s name was considered, but ultimately rejected. By having a standardized naming convention it makes it easy to identify a lock file for both a human and a code editor. This helps facilitate discovery when e.g. a tool wants to know all of the lock files that are available.
File format
Use JSON over TOML
Since having a format that is machine-writable was a goal of this PEP, it was suggested to use JSON. But it was deemed less human-readable than TOML while not improving on the machine-writable aspect enough to warrant the change.
Use YAML over TOML
Some argued that YAML met the machine-writable/human-readable requirement in a
better way than TOML. But as that’s subjective and pyproject.toml
already
existed as the human-writable file used by Python packaging standards it was
deemed more important to keep using TOML.
Other keys
Multiple hashes per file
An initial version of this PEP proposed supporting multiple hashes per file. The idea was to allow one to choose which hashing algorithm they wanted to go with when installing. But upon reflection it seemed like an unnecessary complication as there was no guarantee the hashes provided would satisfy the user’s needs. As well, if the single hash algorithm used in the lock file wasn’t sufficient, rehashing the files involved as a way to migrate to a different algorithm didn’t seem insurmountable.
Hashing the contents of the lock file itself
Hashing the contents of the bytes of the file and storing hash value within the file itself was proposed at some point. This was removed to make it easier when merging changes to the lock file as each merge would have to recalculate the hash value to avoid a merge conflict.
Hashing the semantic contents of the file was also proposed, but it would lead to the same merge conflict issue.
Regardless of which contents were hashed, either approach could have the hash value stored outside of the file if such a hash was desired.
Recording the creation date of the lock file
To know how potentially stale the lock file was, an earlier proposal suggested recording the creation date of the lock file. But for some same merge conflict reasons as storing the hash of the file contents, this idea was dropped.
Recording the package indexes used
Recording what package indexes were used by the locker to decide what to lock for was considered. In the end, though, it was rejected as it was deemed unnecessary bookkeeping.
Locking build requirements for sdists
An earlier version of this PEP tried to lock the build requirements for sdists
under a packages.build-requires
key. Unfortunately it confused enough people
about how it was expected to operate and there were enough edge case issues to
decide it wasn’t worth trying to do in this PEP upfront. Instead, a future PEP
could propose a solution.
Open Issues
Specify requires-python
at the file level?
The lock file formats from PDM, Poetry, and uv all specify
requires-python
at the top level for the absolute minimum Python version
needed for the lock file. This can be inferred, though, by examining all
packages.requires-python
values. The global value might also not be
accurate for all platforms depending on how environment markers influence what
package versions are installed and what their Python version requirements are.
Don’t pre-parse data?
This PEP currently takes the viewpoint that if a piece of data is going to be parsed by installers everytime they run, then trying to pre-parse as much as possible so the TOML parser can help is a good thing. The thinking is TOML parsers have a higher chance of being optimized, and so letting them do more parsing leads to a faster outcome. It should also increase readability by breaking apart data upfront more.
But in the case of doing this to wheel file names, some might consider it too much. The question becomes whether separating out all the parts of a wheel file name hinders readability because people are used to reading the file names already, or by clearly separating its parts it actually helps make installers faster, easier to write, and doesn’t hinder readability.
This all equally applies to requirement specifiers.
Deferred Ideas
Per-file locking
An earlier version of this PEP supported two approaches to locking: per-file and per-package. The idea for the former approach to locking was that if you were locking for an a-priori set of environments you could lock to just the files necessary to install into those environments. The thinking was that by only listing a subset of files that auditing would be easier.
Unfortunately there was disagreement on how best to express upfront what the supported environment requirements would be. Since what this PEP currently proposes still prevents accidental success of installation into unsupported environments, this idea has been deferred until such time someone can come up with a representation that makes sense.
Allowing for multiple lock files
Before the introduction of [[groups]]
, this PEP proposed supporting multiple
lock files that would match the regular expression
r"pylock\.(.+)\.toml"
if a name for the lock file is desired or if multiple
lock files exist. But since [[groups]]
subsumes a lot of the need to support
multiple lock files, this specific feature can be postponed until such time that
a need is shown to support multiple lock files.
Acknowledgements
Thanks to everyone who participated in the discussions on discuss.python.org. Also thanks to Randy Döring, Seth Michael Larson, Paul Moore, and Ofek Lev for providing feedback on a draft version of this PEP.
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Source: https://github.com/python/peps/blob/main/peps/pep-0751.rst
Last modified: 2024-11-05 19:18:47 GMT