PEP 804 – An external dependency registry and name mapping mechanism

Author:: Pradyun Gedam <pradyunsg at gmail.com>, Ralf Gommers <ralf.gommers at gmail.com>, Michał Górny <mgorny at quansight.com>, Jaime Rodríguez-Guerra <jaime.rogue at gmail.com>, Michael Sarahan <msarahan at gmail.com>
Discussions-To:: Discourse thread
Status:: Draft
Type:: Standards Track
Topic:: Packaging
Requires:: 725
Created:: 03-Sep-2025
Post-History:: 22-Sep-2025

Table of Contents

Abstract
Motivation
Rationale
Specification
Backwards Compatibility
Security Implications
How to Teach This
Reference Implementation
Rejected Ideas
Open Issues
References
Appendix A: Operational suggestions
Appendix B: Virtual versioning proposal
Copyright

Abstract

This PEP specifies a name mapping mechanism that allows packaging tools to map external dependency identifiers (as introduced in PEP 725) to their counterparts in other package repositories.

Packages on PyPI often require build-time and runtime dependencies that are not present on PyPI. PEP 725 introduced metadata to express such dependencies. Using concrete external dependency metadata for a Python package requires mapping the given dependency identifiers to the specifiers used in other ecosystems, which would allow:

Enabling tools to automatically map external dependencies to packages in other packaging repositories/ecosystems,
Including the needed external dependencies with the package names used by the relevant system package manager on the user’s system in error messages emitted by Python package installers and build frontends, as well as allowing the user to obtain installation instructions for those packages.

Packaging ecosystems like Linux distros, conda, Homebrew, Spack, and Nix need full sets of dependencies for Python packages, and have tools like pyp2rpm (Fedora), Grayskull (conda), and dh_python (Debian) which attempt to automatically generate dependency information from the metadata available in upstream Python packages. Before PEP 725, external dependencies were handled manually, because there was no metadata for this in pyproject.toml or any other standard metadata file. Enabling its automatic conversion is a key benefit of this PEP, making Python packaging easier and more reliable. In addition, the authors envision other types of tools making use of this information; e.g. dependency analysis tools like Repology, Dependabot and libraries.io.

Rationale

Prior art

The R language has a System Requirements for R packages with a central registry that knows how to translate external dependency metadata to install commands for package managers like apt-get. This registry centralises the mappings for a series of Linux distributions, and also Windows. macOS is not present. The “Rule Coverage” of its README used to show that this system improves the chance of success of building packages from CRAN from source. Across all CRAN packages, Ubuntu 18 improved from 78.1% to 95.8%, CentOS 7 from 77.8% to 93.7% and openSUSE 15.0 from 78.2% to 89.7%. The chance of success depends on how well the registry is maintained, but the gain is significant: ~4x fewer packages fail to build on Ubuntu and CentOS in a Docker container.

RPM-based distributions, like Fedora, can use a rule-based implementation (NameConvertor) in pyp2rpm. The main rule is that the RPM name for a PyPI package is typically f"python3-{pypi_package_name}". The rare exceptions include packages that primarily distribute an application, which drop the prefix, (e.g. the Black formatter is simply black, not python3-black), and variants for different Python versions (e.g. in RHEL 9 setuptools can be found as python3-setuptools for Python 3.9, but python3.11-setuptools and python3.12-setuptools are also available). More details are available in Fedora’s packaging guidelines for Python.

Debian packages typically follow a f"python3-{import_name}" naming scheme, with some exceptions: some sub-communities have an infix (e.g. Django packages go under f"python3-django-*"), and applications are often distributed by their name, with no python3- prefix. Additional details are available in Debian’s Python Policy.

Gentoo follows a similar approach to naming Python packages, using the dev-python/ category and some well-specified rules.

Conda-forge has a more explicit name mapping, because the base names are the same in conda-forge as on PyPI (e.g. numpy maps to numpy), but there are many exceptions because of both name collisions and renames (e.g. the PyPI name for PyTorch is torch while in conda-forge it’s pytorch). There are several name mappings efforts maintained by different teams. Conda-forge’s infrastructure generates one in regro/cf-graph-countyfair. Grayskull maintains its own curated mapping. Prefix.dev created the parselmouth mappings to support conda and PyPI integrations in their tooling. A more complete overview of their approaches, strengths and weaknesses can be found in conda/grayskull#564.

The OpenStack ecosystem also needs to deal with some mapping efforts. All of them focus on Linux distributions, exclusively. pkg-map accompanies diskimage-builder and provides a file format where the user defines arbitrary variable names and their corresponding names in the target distro (Red Hat, Debian, OpenSUSE, etc). See example for PyYAML. bindep defines a file bindep.txt (see example) where users can write down dependencies that are not installable from PyPI. The format is line-based, with each line containing a dependency as found in the Debian ecosystem. For other distributions, it offers a “filters” syntax between square brackets where users can indicate other target platforms, optional dependencies and extras.

The need for mappings is also found in other ecosystems like SageMath, but also by end-users themselves who want to install PyPI packages with their system package manager of choice (example StackOverflow question).

Governance and maintenance costs of name mappings

The maintenance cost of external dependency mappings to a large number of packaging ecosystems is potentially high. We choose to define the registry in such a way that:

A central authority maintains the list of recognized DepURLs and the known ecosystem mappings.
The mappings themselves are maintained by the target packaging ecosystems.

Hence this system is opt-in for a given ecosystem, and the associated maintenance costs are distributed.

Generating package manager-specific install commands

Python package authors with external dependencies usually have installation instructions for those external dependencies in their documentation. These instructions are difficult to write and keep up-to-date, and are usually only covering one or at most a handful of platforms. As an example, here are SciPy’s instructions for its external build dependencies (C/C++/Fortran compilers, OpenBLAS, pkg-config):

Debian/Ubuntu: sudo apt install -y gcc g++ gfortran libopenblas-dev liblapack-dev pkg-config python3-pip python3-dev
Fedora/CentOS/RHEL: sudo dnf install gcc-gfortran python3-devel openblas-devel lapack-devel pkgconfig
Arch Linux: sudo pacman -S gcc-fortran openblas pkgconf
Homebrew on macOS: brew install gfortran openblas pkg-config

The package names vary a lot, and there are differences like some distros splitting off headers and other build-time dependencies in a separate -dev/-devel package while others do not. With the registry in this PEP, this could be made both more comprehensive and easier to maintain through a tool command with semantics of “show this ecosystem’s preferred package manager install command for all external dependencies”. This may be done as a standalone tool, or as a new subcommand in any Python development workflow tool (e.g. Pip, Poetry, Hatch, PDM, uv).

To this end, each ecosystem mapping can provide a list of package managers known to be compatible, with templated instructions on how to install and query installed packages. The provided install command templates are paired with query command templates so those tools can check whether the needed packages are already present without having to attempt an install operation (which might be expensive and have unintended side effects like version upgrades).

Registry design

The mapping infrastructure has been designed to present the following components and properties:

A central registry of PEP 725 identifiers (DepURLs), including at least the well-known generic and virtual identifiers considered canonical.
A list of known ecosystems, where ecosystem maintainers can register their name mapping(s).
A standardized schema that defines how mappings should be structured. Each mapping can also provide programmatic details about how their supported package manager(s) work.

The above documents are provided as JSON files validated by accompanying JSON schemas. A Python library and CLI is provided to query and utilize these resources. The user can configure which system package manager they prefer to use for the default package mappings and command generation (e.g. a user on Ubuntu may prefer conda, brew or spack instead of apt as their package manager of choice to provide external dependencies).

Specification

Central registry

The central registry defines which identifiers are recognized as canonical, plus known aliases.

Having a central registry enables the validation of the [external] table. All involved tools MUST check that the provided identifiers are well formed. Additionally, some tools MAY check whether the identifiers in use are recognized as canonical. More specifically:

Build backends, build frontends, and installers SHOULD NOT do any validation of identifiers being canonical by default.
Uploaders like twine SHOULD validate if the identifiers are canonical and warn or report an error to the user, with opt-out mechanisms. They SHOULD suggest a canonical replacement, if available.
Index servers like PyPI MAY perform the same validation as the uploaders and reject the artifact if necessary.

This registry SHOULD also centralize authoritative decisions about its contents, such as which entry of a collection of aliases is preferred as canonical, or which versioning scheme applies to virtual DepURLs (see Appendix B). The corresponding answers are not given in this PEP; instead we delegate that responsibility to the central registry maintainers.

The canonical filename for the central registry document MUST be registry.json.

Schema

The central registry is specified by the following JSON schema:

`$schema`

Type	`string`
Description	URL of the definition list schema in use for the document.
Required	False

`schema_version`

Type	`integer`
Required	False

`definitions`

Type	`array`
Description	List of DepURLs currently recognized.
Required	True

Each entry in this list is defined as:

Field	Type	Description
`id` (required)	`string` matching regex `^dep:.+$`	The entry identifier MUST be a valid DepURL string.
`description`	`string`	Free-form field to add some details about the package. Allows Markdown.
`provides`	`DepURLField \| list[DepURLField]`	List of `id` strings this entry connects to. Useful to annotate aliases (e.g. `dep:generic/arrow` and `dep:github/apache/arrow`) or virtual package implementations (e.g. `dep:generic/gcc` would provide `dep:virtual/compiler/c`). This field MUST NOT be present in `dep:virtual/` definitions. Entries without `provides` content or, if populated, only with `dep:virtual/` identifiers, are considered canonical.
`urls`	`AnyUrl \| list[AnyUrl] \| dict[NonEmptyString, AnyUrl]`	Hyperlinks to web locations that provide more information about the definition.

Known ecosystems

The list of known ecosystems has two roles:

Reporting the canonical URL for a given ecosystem mapping.
Assigning a unique, short identifier to each ecosystem, as described in Mappings.

The canonical filename for the known ecosystems list MUST be known-ecosystems.json.

Schema

The known ecosystems list is specified by the following JSON Schema:

`$schema`

Type	`string`
Description	URL of the schema in use for the document.
Required	False

`schema_version`

Type	`integer`
Description	Version of the schema in use.
Required	False

`ecosystems`

Type	`dict`
Description	Ecosystems names and their corresponding details.
Required	True

This dictionary maps non-empty string keys referring to the ecosystem identifiers to a sub-dictionary defined as:

Key	Value type	Value description
`mapping` (required)	`AnyURL`	URL to the mapping for this ecosystem.

Mappings

The mappings specify which ecosystem-specific identifiers provide the canonical entries available in the central registry. A mapping mainly consists of two lists of dictionaries: one where each entry maps a DepURL to one or more ecosystem-specific identifiers, and another that exposes how to use one or more package managers.

Each mapping MUST have a canonical URL for online retrieval. Its complete filename MUST be {ecosystem-identifier}.mapping.json, where “ecosystem identifier” MUST conform to this regex: [a-z0-9\-_.]+(\+[a-z0-9\-_.]+)?. In other words, a first field optionally followed by a second, separated by a + symbol.

For ecosystems corresponding to Linux distributions, the first field MUST correspond to the ID string as specified in the os-release specification. If provided and relevant, the second field MUST correspond to the VERSION_ID string.

Since the version field is optional, tools SHOULD try to access the versioned identifier but fallback to the name-only identifier if not found.

Schema

The mappings are specified by the following JSON Schema:

`$schema`

Type	`string`
Description	URL of the mappings schema in use for the document.
Required	False

`schema_version`

Type	`integer`
Description	Version of the schema in use.
Required	False

`name`

Type	`string`
Description	Display name for the mapping.
Required	True

`description`

Type	`string`
Description	Free-form field to add information this mapping. Allows Markdown.
Required	False

`mappings`

Type	`array`
Description	List of DepURL-to-specs mappings.
Required	True

Each entry in mappings is defined as:

Field	Type	Description
`id` (required)	`string` matching regex `^dep:.+$`	DepURL, as provided in the central registry.
`description`	`string`	Free-form field to add some details about the package. Allows Markdown.
`specs` †	`string \| list[string] \| dict[Literal['build', 'host', 'run'], string \| list[string]]`	Ecosystem-specific identifiers for this package. The full form is a dictionary that maps the categories `build`, `host` and `run` to their corresponding package identifiers. As a shorthand, a single string or a list of strings can be provided, in which case will be used to populate the three categories identically. An empty list indicates that the ecosystem does not have packages for this entry.
`specs_from` †	`string` matching regex `^dep:.+$`	DepURL identifier of another entry whose `specs` will be reused here.
`urls`	`AnyUrl \| list[AnyUrl] \| dict[NonEmptyString, AnyUrl]`	Hyperlinks to web locations that provide more information about the definition.
`extra_metadata`	`dict[NonEmptyString, Any]`	Free-form key-value store for arbitrary metadata.

† Exactly one of specs and specs_from MUST be present.

`package_managers`

Type	`array`
Description	List of tools that can be used to install packages in this ecosystem.
Required	True

Each entry in package_managers MUST be a dictionary with these fields:

Field	Type	Description
`name` (required)	`string`	Short identifier for this package manager (usually the command name).
`commands` (required)	`dict`	See subsection below.
`specifier_syntax` (required)	`dict`	See subsection below.

`commands`

Commands used to install or query the given package(s).

It MUST be a dictionary where only two keys MUST be allowed: install (to generate install instructions) and query (to check whether a given package is already installed). Their value MUST be a dictionary with:

a required command key that MUST take a list of strings (as expected by subprocess.run). Exactly one item in this list MUST be the {} placeholder, which will be replaced by the mapped package specifier(s).
an optional requires_elevation boolean (False by default) to indicate whether the command must run with elevated permissions (e.g. administrator on Windows, superuser on Linux and macOS).
a required multiple_specifiers enum that determines whether the command accepts multiple package specifiers at the same time, taking one of:
- always, default in install.
- name-only, the command only accepts multiple specifiers if they do not contain version constraints.
- never, default in query.

The install command SHOULD support the placeholder being replaced by multiple specifiers; query MUST only receive a single specifier per command.

For install, the exit code MUST be 0 when the package was successfully installed or if it was already present.

For query, if the package is installed, the command MUST result in an exit code of 0. Otherwise, a non-zero exit code MUST be returned.

`specifier_syntax`

A dictionary describing the instructions on how to map a subset of PEP 440 specifiers (as determined in PEP 725) to the target package manager. Three levels of support are offered: name-only, exact-version-only, and version-range compatibility (with per-operator translations). Subsequently, these three top-level keys MUST be required. Extra keys MUST NOT be allowed.

name_only MUST take a list of strings as the syntax used for specifiers that do not contain any version information; it MUST include the placeholder {name}.
exact_version MUST be None or a list of strings that describe the syntax used for specifiers that only express exact version constraints; in the latter case, the placeholders {name} and {version} MUST be present in at least one of the strings (although not necessary the same string for both).
version_ranges MUST be None or a dictionary with the following required keys:
- the key syntax takes a list of strings where at least one MUST include the {ranges} placeholder (to be replaced by the maybe-joined version constraints, as determined by the value of and). They MAY also include the {name} placeholder.
- the keys equal, greater_than, greater_than_equal, less_than, and less_than_equal take a string if the operator is supported, None otherwise. In the former case, the value MUST include the {version} placeholder, and MAY include {name}.
- the key and takes a string used to join multiple version constraints in a single token, or None if only a single constraint can be used per token. In the latter case, the different constraints will be “exploded” into several tokens using the syntax template.
When exact_version or version_ranges are set to None, it indicates that the respective types of specifiers are not supported by the package manager.

Note

The specifier_syntax mappings are meant to provide interoperability between ecosystems where choosing which package version to install is possible. For example, this is not the case in many Linux distributions, where each distro release commits to a package version during its lifecycle (although often with the necessary security backports).

In these cases, the install command could be used, optimistically, in “name-only” mode, hoping that the OS-provided version is a good fit. A more pessimistic alternative would be to use the query command first to see if the available version matches the project constraints, and then install the package by name.

Even in those cases, perfect 1:1 version matching is not always possible due to how different ecosystems map upstream releases to repackaged versions (e.g. the epoch had to be bumped to accommodate a change of release schema). In that regard, we do not encode explicit mapping semantics for epochs or pre-releases.

Redistribution

The central registry, the known ecosystems list and the mapping documents MAY be packaged for offline distribution in each platform.

The authors recommend placing them in the standard location for data artifacts in each operating system; e.g. $XDG_DATA_DIRS on Linux and others, ~/Library/Application Support on macOS, and %LOCALAPPDATA% for Windows. The subdirectory identifier MUST be external-packaging-metadata-mappings, and SHOULD only contain documents corresponding to the aforementioned schemas, which MUST use their canonical filenames.

Examples

Registry

A simplified registry would look like this:

{
  "$schema": "https://raw.githubusercontent.com/jaimergp/external-metadata-mappings/main/schemas/central-registry.schema.json",
  "schema_version": 1,
  "definitions": [
    {
      "id": "dep:generic/zlib",
      "description": "A Massively Spiffy Yet Delicately Unobtrusive Compression Library"
    },
    {
      "id": "dep:generic/libwebp",
      "description": "WebP codec is a library to encode and decode images in WebP format. This package contains the library that can be used in other programs to add WebP support"
    },
    {
      "id": "dep:generic/clang",
      "description": "Language front-end and tooling infrastructure for languages in the C language family for the LLVM project."
    }
  ]
}

Known ecosystems

A minimal list of known ecosystems with a single entry would look like this:

{
  "$schema": "https://raw.githubusercontent.com/jaimergp/external-metadata-mappings/main/schemas/known-ecosystems.schema.json",
  "schema_version": 1,
  "ecosystems": {
    "conda-forge": {
      "mapping": "https://raw.githubusercontent.com/jaimergp/external-metadata-mappings/refs/heads/main/data/conda-forge.mapping.json"
    }
}

Some representative identifiers:

Ecosystem	Identifier	Filename
Debian Bookworm	`debian+12`	`debian+12.mapping.json`
Fedora 40	`fedora+40`	`fedora+40.mapping.json`
Ubuntu 24.04	`ubuntu+24.04`	`ubuntu+24.04.mapping.json`
Arch Linux (rolling)	`arch`	`arch.mapping.json`
Homebrew	`homebrew`	`homebrew.mapping.json`
conda-forge	`conda-forge`	`conda-forge.mapping.json`

Mappings

A hypothetical conda-forge mapping (conda-forge.mapping.json), with only a couple entries for brevity, could look like:

{
  "schema_version": 1,
  "name": "conda-forge",
  "description": "Mapping for the conda-forge ecosystem",
  "mappings": [
    {
      "id": "dep:generic/zlib",
      "description": "Massively spiffy yet delicately unobtrusive compression library.",
      "specs": "zlib",  // Simplest form
      "urls": {
        "feedstock": "https://github.com/conda-forge/zlib-feedstock"
      }
    },
    {
      "id": "dep:generic/libwebp",
      "description": "WebP image library. libwebp-base ships libraries; libwebp ships the binaries.",
      "specs": {  // expanded form with single spec per category
        "build": "libwebp",
        "host": "libwebp-base",
        "run": "libwebp"
      },
      "urls": {
        "feedstock": "https://github.com/conda-forge/libwebp-feedstock"
      }
    },
    {
      "id": "dep:generic/clang",
      "description": "Development headers and libraries for Clang",
      "specs": { // expanded form with specs list
        "build": [
          "clang",
          "clangxx"
        ],
        "host": [
          "clangdev"
        ],
        "run": [
          "clang",
          "clangxx",
          "clang-format",
          "clang-tools"
        ]
      },
      "urls": {
        "feedstock": "https://github.com/conda-forge/clangdev-feedstock"
      }
    },
  ],
  "package_managers": [
    {
      "name": "conda",
      "commands": {
        "install": {
          "command": [
            "conda",
            "install",
            "{}"
          ],
          "multiple_specifiers": "always",
          "requires_elevation": false,
        },
        "query": {
          "command": [
            "conda",
            "list",
            "-f",
            "{}"
          ],
          "multiple_specifiers": "never",
          "requires_elevation": false,
        }
      },
      "specifier_syntax": {
        "exact_version": [
          "{name}=={version}"
        ],
        "name_only": [
          "{name}"
        ],
        "version_ranges": {
          "and": ",",
          "equal": "={version}",
          "greater_than": ">{version}",
          "greater_than_equal": ">={version}",
          "less_than": "<{version}",
          "less_than_equal": "<={version}",
          "syntax": [
            "{name}{ranges}"
          ]
        }
      }
    }
  ]
}

Practical examples

The following repository provides examples of how these schemas could look like in real cases. They are not meant to be prescriptive, but just illustrative of how to apply these schemas:

Central registry.
Known ecosystems.
Mappings:
- Arch-linux.
- Chocolatey.
- Conan.
- Conda-forge.
- Fedora.
- Gentoo.
- Homebrew.
- Nix.
- PyPI.
- Scoop.
- Spack.
- Ubuntu.
- Vcpkg.
- Winget.

pyproject-external CLI

The following examples illustrate how the name mapping mechanism may be used. They use the CLI implemented as part of the pyproject-external package.

Say we have cloned the source of a Python package named my-cxx-pkg with a single extension module, implemented in C++, linking to zlib, using pybind11, plus meson-python as the build backend:

[build-system]
build-backend = 'mesonpy'
requires = [
  "meson-python>=0.13.1",
  "pybind11>=2.10.4",
]

[external]
build-requires = [
  "dep:virtual/compiler/cxx",
]
host-requires = [
  "dep:generic/zlib",
]

With complete name mappings for apt on Ubuntu, this may then show the following:

# show all external dependencies as DepURLs
$ python -m pyproject_external show .
[external]
build-requires = [
    "dep:virtual/compiler/cxx",
]
host-requires = [
    "dep:generic/zlib",
]

# show all external dependencies, but mapped to the autodetected ecosystem
$ python -m pyproject_external show --output=mapped .
[external]
build-requires = [
    "g++",
    "python3",
]
host-requires = [
    "zlib1g",
    "zlib1g-dev",
]

# show how to install external dependencies
$ python -m pyproject_external show --output=command .
sudo apt install --yes g++ zlib1g zlib1g-dev python3

We have not yet run those install commands, so the external dependency may be missing. If we get a build failure, the output may look like:

$ pip install .
...
× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.

This package has the following external dependencies, if those are missing
on your system they are likely to be the cause of this build failure:

  dep:virtual/compiler/cxx
  dep:generic/zlib

If Pip has implemented support for querying the name mapping registry, the end of that message could improve to:

The following external dependencies are needed to install the package
mentioned above. You may need to install them with `apt`:

  g++
  zlib1g
  zlib1g-dev

If the user wants to use conda packages and the mamba package manager to install external dependencies, they may specify that in their ~/.config/pyproject-external/config.toml (or equivalent) file:

preferred_package_manager = "mamba"

This will then change the output of pyproject-external:

$ python -m pyproject_external show --output command .
mamba install --yes --channel=conda-forge --channel-priority=strict cxx-compiler zlib python

The pyproject-external CLI also provides a simple way to perform [external] table validation against the central registry to check whether the identifiers are considered canonical or not:

$ python -m pyproject_external show --validate grpcio-1.71.0.tar.gz
WARNING  Dep URL 'dep:virtual/compiler/cpp' is not recognized in the
central registry. Did you mean any of ['dep:virtual/compiler/c',
'dep:virtual/compiler/cxx', 'dep:virtual/compiler/cuda',
'dep:virtual/compiler/go', 'dep:virtual/compiler/c-sharp']?
[external]
build-requires = [
    "dep:virtual/compiler/c",
    "dep:virtual/compiler/cpp",
]

pyproject-external API

The pyproject-external Python API also allows users to do these operations programmatically:

>>> from pyproject_external import External
>>> external = External.from_pyproject_data(
      {
        "external": {
          "build-requires": [
            "dep:virtual/compiler/c",
            "dep:virtual/compiler/cpp",
          ]
        }
      }
    )
>>> external.validate()
Dep URL 'dep:virtual/compiler/cpp' is not recognized in the central registry. Did you
mean any of ['dep:virtual/compiler/c', 'dep:virtual/compiler/cxx',
'dep:virtual/compiler/cuda', 'dep:virtual/compiler/go', 'dep:virtual/compiler/c-sharp']?
>>> external = External.from_pyproject_data(
      {
        "external": {
          "build-requires": [
            "dep:virtual/compiler/c",
            "dep:virtual/compiler/cxx",  # fixed
          ]
        }
      }
    )
>>> external.validate()
>>> external.to_dict()
{'external': {'build_requires': ['dep:virtual/compiler/c', 'dep:virtual/compiler/cxx']}}
>>> from pyproject_external import detect_ecosystem_and_package_manager
>>> ecosystem, package_manager = detect_ecosystem_and_package_manager()
>>> ecosystem
'conda-forge'
>>> package_manager
'pixi'
>>> external.to_dict(mapped_for=ecosystem, package_manager=package_manager)
{'external': {'build_requires': ['c-compiler', 'cxx-compiler', 'python']}}
>>> external.install_commands(ecosystem, package_manager=package_manager)
# {"command": ["pixi", "add", "{}"]}
[
  ['pixi', 'add', 'c-compiler', 'cxx-compiler', 'python'],
]
>>> external.query_commands(ecosystem, package_manager=package_manager)
# {"command": ["pixi", "list", "{}"]}
[
  ['pixi', 'list', 'c-compiler'],
  ['pixi', 'list', 'cxx-compiler'],
  ['pixi', 'list', 'python'],
]

Grayskull

A prototype proof of concept implementation was contributed to Grayskull, a conda recipe generator for Python packages, via conda/grayskull#518.

In order to use the name mappings for the recipe generator of our package, we can now run Grayskull:

$ grayskull pypi my-cxx-pkg
#### Initializing recipe for my-cxx-pkg (pypi) ####

Recovering metadata from pypi...
Starting the download of the sdist package my-cxx-pkg
my-cxx-pkg 100% Time:  0:00:10   5.3 MiB/s|###########|
Checking for pyproject.toml
...

Build requirements:
  - python                                 # [build_platform != target_platform]
  - cross-python_{{ target_platform }}     # [build_platform != target_platform]
  - meson-python >= 0.13.1                 # [build_platform != target_platform]
  - pybind11 >= 2.10.4                     # [build_platform != target_platform]
  - ninja                                  # [build_platform != target_platform]
  - libboost-devel                         # [build_platform != target_platform]
  - {{ compiler('cxx') }}
Host requirements:
  - python
  - meson-python >=0.13.1
  - pybind11 >=2.10.4
  - ninja
  - libboost-devel
Run requirements:
  - python

#### Recipe generated on /path/to/recipe/dir for my-cxx-pkg ####

Backwards Compatibility

There is no impact on backwards compatibility.

Security Implications

This proposal does not impose any security implications on existing projects. The proposed schemas, registries and mappings are available resources for downstream tooling to use at their own will, in whatever way they find suitable.

We do have some recommendations for future implementors. The mapping schema proposes fields to encode instructions for command execution (package_managers[].commands). A tampered mapping may change these instructions into something else. Hence, tools should not rely on internet connectivity to fetch the mappings from their online sources. Instead:

they should vendor the relevant documents in the distributed packages,
or depend on prepackaged, offline distributions of these documents,
or implement best practices for authenticity verification of the fetched documents.

The install commands have the potential to modify the system configuration of the user. When available, tools should prefer creating ephemeral, isolated environments for the installation of external dependencies. If the ecosystem lacks that feature natively, other solutions like containerization may be used. At the very least, informative messaging of the impact of the operation should be provided.

How to Teach This

There are at least four audiences that may need to get familiar with the contents of this PEP:

Central registry maintainers, who are responsible for curating the list of well-known DepURLs and mapped ecosystems.
Packaging ecosystem maintainers, who are responsible for keeping the mapping for their ecosystem up-to-date.
Maintainers of Python projects that require external dependencies.
End users of packages that have external dependency metadata.

Central DepURL registry maintainers

Central DepURL registry maintainers curate the collection of DepURLs and the known ecosystems. These contributors need to be able to refer to clearly defined rules for when a new DepURL can be defined. It is undesirable to be loose with canonical DepURL definitions, because each definition added increases maintenance effort in the mappings in the target ecosystems.

The central registry maintainers should agree on the ground rules and write them down as part of the repository documentation, perhaps supported by additional affordances like issue and pull request templates, or linting tools.

Package ecosystem maintainers usage

Missing mapping entries will result in the absence of tailored error messages and other UX affordances for end users of the impacted ecosystems. It is thus recommended that each package ecosystem keeps their mappings up-to-date with the central registry. The key to this will be automation, like linting scripts (see example at external-metadata-mappings), or periodic notifications via issues or draft submissions.

Establishing the initial mapping is likely to involve a lot of work, but ideally the maintenance on an ongoing basis effort should require smaller effort.

As best practices are discovered and agreed on, they should get documented in the central registry repository as learning materials for the mapping maintainers.

Maintainers of Python projects

A package maintainer’s responsibility is to decide the DepURL that best represents the external dependency that their package needs. This is covered in PEP 725; the interactive mappings browser demo located at external-metadata-mappings.streamlit.app may come handy. The central registry documentation may include examples and frequently asked questions to guide newcomers with their decisions.

If no suitable DepURL is available for a given dependency, maintainers may consider submitting a request in the central registry. Instructions on how to do this should be provided as part of the central registry documentation.

End user package consumers

There will be no change in the user experience by default. This is particularly true if the user only relies on wheels, since the only impact will be driven by external runtime dependencies (expected to be rare), and even in those cases they need to opt-in by installing a compatible tool.

Users that do opt-in may find missing entries for their target ecosystems, for which they should obtain informative error messages that point to the relevant documentation sections. This will allow them to get acquainted with the nature of the issue and its potential solutions.

We hope that this results in a subset of them reporting the missing entries, submitting a fix to the affected mapping or, if totally absent, even deciding to maintain a new one on their own. To that end, they should get familiar with the responsibilities of mapping maintainers (discussed above).

Reference Implementation

A reference implementation should include three components:

A central registry that captures at a minimum a DepURL and its description. This registry MUST NOT contain specifics of package ecosystem mappings.
A standard specification for a collection of mappings. JSON Schema is widely used for schema in many text editors, and would be a natural choice for expression of the standard specification.
An implementation of (2), providing mappings from the contents of the central registry to the ecosystem-specific package names.

For (1), the JSON Schema is defined at central-registry.schema.json. An example registry can be found at registry.json. For (2), the JSON Schema is defined at external-mapping.schema.json. A collection of example mappings for a sample of packages can be found at external-metadata-mappings. For (3), the JSON Schema is defined at known-ecosystems.schema.json. An example list can be found at known-ecosystems.json. The JSON Schemas are created with these Pydantic models.

The reference CLI and Python API to consume the different JSON documents and [external] tables can be found in pyproject-external.

Rejected Ideas

Centralized mappings governed by the same body

While a central authority for the registry is useful, the maintenance burden of handling the mappings for multiple ecosystems is unfeasible at the scale of PyPI. Hence, we propose that the central authority only governs the central registry and the list of known ecosystems, while the maintenance of the mappings themselves is handled by the target ecosystems.

Allowing ecosystem-specific variants of packages

Some ecosystems have their own variants of known packages; e.g. Debian’s libsymspg2-dev. While an identifier such as dep:deb/debian/libsymspg2-dev is syntactically valid, the central registry should not recognize it as a well-known identifier, preferring its generic counterpart instead. Users may still choose to use it, but tools may warn about it and suggest using the generic one. This is meant to encourage ecosystem-agnostic metadata whenever possible to facilitate adoption across platforms and operating systems.

Adding more package metadata to the central registry

A central registry should only contain a list of DepURLs and a minimal set of metadata fields to facilitate its identification (a free-form text description, and one or more URLs to relevant locations).

We have chosen to leave additional details out of the central registry, and instead suggest external contributors to maintain their own mappings where they can annotate the identifiers with extra metadata via the free-form extra_metadata field.

The reasons include:

The existing fields should be sufficient to identify the project home, where that extra metadata can be obtained (e.g. the repository at the URL will likely include details about authorship and licensing).
These details can also be obtained from the actual target ecosystems. In some cases this might even be preferable; e.g. for licenses, where downstream packaging can actually affect it by unvendoring dependencies or adjusting optional bits.
Those details may change over the lifetime of the project, and keeping them up-to-date would increase the maintenance burden on the governance body.
Centralizing additional metadata would hence introduce ambiguities and discrepancies across target ecosystems, where different versions may be available or required.

Mapping PyPI projects to repackaged counterparts in target ecosystems

It is common that other ecosystems redistribute Python projects with their own packaging system. While this is required for packages with compiled extensions, it is theoretically unnecessary for pure Python wheels; the only need for this seems to be metadata translation. See Wanting a singular packaging tool/vision #68, Wanting a singular packaging tool/vision #103, and spack/spack#28282 for examples of discussions in this direction.

The proposals in this PEP do not consider PyPI -> ecosystem mappings, but the same schemas can be repurposed to that end. After all, it is trivial to build a PURL or DepURL from a PyPI name (e.g. numpy becomes pkg:pypi/numpy). A hypothetical mapping maintainer could annotate their repackaging efforts with the source PURL identifier, and then use that metadata to generate compatible mappings, such as:

{
  "$schema": "https://raw.githubusercontent.com/jaimergp/external-metadata-mappings/main/schemas/external-mapping.schema.json",
  "schema_version": 1,
  "name": "PyPI packages in Ubuntu 24.04",
  "description": "PyPI mapping for the Ubuntu 24.04 LTS (Noble) distro",
  "mappings": [
    {
      "id": "dep:pypi/numpy",
      "description": "The fundamental package for scientific computing with Python",
      "specs": ["python3-numpy"],
      "urls": {
        "home": "https://numpy.org/"
      }
    }
  ]
}

Such a mapping would allow downstream redistribution efforts to focus on the compiled packages and instead delegate pure wheels to Python packaging solutions directly.

Strict validation of identifiers

The central registry provides a list of canonical identifiers, which may tempt implementors into ensuring that all supplied identifiers are indeed canonical. We have decided to only recommend this practice for some tool categories, but in no case require such checks.

It is expected that as the [external] metadata tables are adopted by the packaging community, the canonical identifier list grows to accommodate the requirements found in different projects. For example, a new C++ library or a new language compiler are introduced.

If validation is made too strict and rejects unknown identifiers, this would introduce unnecessary friction in the external metadata adoption, and require human interaction to review and accept the newly requested identifiers in a time-critical manner, potentially blocking publication of the package that needs a new identifier added to the central registry.

We suggest simply checking that the provided identifiers are well-formed. Future work may choose to also enforce that the identifiers are recognized as canonical, once the central registry has matured with significant adoption.

Inheritance and cross-referenced mappings

A potential improvement to improve the reusability of mappings is to provide a mechanism to inherit a parent mapping and extend it or replace it with additional values. The authors have decided to not add this feature given the implied complexity (e.g. URL resolution, nested dependencies, possibilities of broken resources). Instead, the following alternatives are proposed:

For mapping authors, automate the generation of derived mappings via scripting and cron jobs. For example, simple logic such as fetching the parent mapping, applying the necessary modifications and republishing it to the target location should not result in much maintenance burden.
For end-users wishing to extend a given mapping with custom overrides, client-side tools should implement the necessary affordances to do this easily. For example, a tool such as pyproject-external could provide the following CLI flags or environment variables:
- --use-mapping / <TOOL>_USE_MAPPING: Use the given local or remote mapping instead of the the canonical location.
- --patch-mapping / <TOOL>_PATCH_MAPPING: Given a local or remote mapping, replace the matching keys in the canonical location and append the non-matching ones.
- --extend-mapping / <TOOL>_EXTEND_MAPPING: Given a local or remote mapping, append its contents to the canonical one. Assuming the tool allows the user to pick different mapping options if more than one is available, this option enriches the set of options without complete overrides.

So, for example, given a package with this external table:

[external]
build-requires = [
  "dep:virtual/compiler/c",
]
host-requires = [
  "dep:generic/libffi",
]

And a target ecosystem that maps dep:virtual/compiler/c to gcc but clang is preferred, the following mapping override could be provided:

{
  "$schema": "https://raw.githubusercontent.com/jaimergp/external-metadata-mappings/main/schemas/external-mapping.schema.json",
  "schema_version": 1,
  "name": "ecosystem override",
  "description": "Mapping override for my ecosystem of choice",
  "mappings": [
    {
      "id": "dep:virtual/compiler/c",
      "description": "Clang override",
      "specs": "clang"
    }
  ]
}

Then, it would be used like this:

$ python -m my-tool show \
    sdist/cryptography-46.0.2.tar.gz \
    --output install-command \
    --patch-mapping=my-override.mapping.json

Tracking package name changes

Packaging ecosystems tend to correct, extend and evolve the naming schemes used. It is common to split what started as a monolithic build into smaller components (e.g. avoid shipping development files to runtime-only environments, saving bandwidth). Some practitioners also use the package name to track ABI compatibility across SONAME changes. The reasons may be multiple and diverse, but the problem is the same: a given upstream project name may be distributed as different names over time.

A proposal to track these changes in the mapping suggested the inclusion of additional date fields (such as valid_from and valid_to), but the authors decided to reject this idea. It adds complexity to the implementation, it is difficult to maintain up-to-date, and doesn’t add value to the end-user, simply serving as a historical record.

Instead, we expect that versioned distributions maintain a separate mapping per release (see the proposed mapping naming schemes). Rolling ecosystems should strive to keep alias packages around, with deprecation warnings if needed and feasible. In general, we also recommend keeping the mapping files under public version control so end-users can refer to older versions if necessary.

Reusing existing databases as a central registry

A cursory online search for cross-ecosystem databases of packages would reveal different sets of results close to the needs of this proposal, but not quite there. For example:

Some solutions only focus on Linux distributions or Unix systems, like Repology or pkgs.org.
Other services like Libraries.io require a login.
Other providers like ecosyste.ms are only available via APIs.
The service purldb only focuses on collecting concrete PURLs (which identify specific package artifacts), instead of abstract PURLs concerned with identifying input requirements.

The proposed mappings try to be as lightweight as possible, without requiring the maintenance of a live server and an API. Simply a collection of static JSON files that can be easily updated and distributed online and offline.

If in the future a service exists providing the following features, then it would be a strong contender for superseding this PEP:

Provides mappings between source DepURLs, PURLs and their repackaged counterparts. This implies that PURLs have gained the notion of virtual packages and ergonomic version range expressions.
Can generate package manager instructions for a given input PURL.
Can be distributed as local artifacts for offline consumption.
Does not require a live server or an API.
FOSS-licensed.

Open Issues

None at this time.

References

Appendix A: Operational suggestions

In contrast with the ecosystem mappings, the central registry and the list of known ecosystems need to be maintained by a central authority. The authors propose to:

Host the external-metadata-mappings and pyproject-external repositories under the PyPA GitHub organization (or equivalent as per PEP 772).
Create a maintainers team for these two repositories, seeded with the authors of this PEP and regulated as per PEP 772.

Appendix B: Virtual versioning proposal

While virtual dependencies can be versioned with the same syntax as non-virtual dependencies, its meaning can be ambiguous (e.g. there can be multiple implementations, and virtual interfaces may not be unambiguously versioned). Below we provide some suggestions for the central registry maintainers to consider when standardizing such meaning:

OpenMP: has regular MAJOR.MINOR versions of its standard, so would look like >=4.5.
BLAS/LAPACK: should use the versioning used by Reference LAPACK, which defines what the standard APIs are. Uses MAJOR.MINOR.MICRO, so would look like >=3.10.0.
Compilers: these implement language standards. For C, C++ and Fortran these are versioned by year. In order for versions to sort correctly, we recommend using the full year (four digits). So “at least C99” would be >=1999, and selecting C++14 or Fortran 77 would be ==2014 or ==1977 respectively. Other languages may use different versioning schemes. These should be described somewhere before they are used in pyproject.toml.

Copyright

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.