PEP: 658 Title: Serve Distribution Metadata in the Simple Repository API
Author: Tzu-ping Chung <uranusjr@gmail.com> Sponsor: Brett Cannon
<brett@python.org> PEP-Delegate: Donald Stufft <donald@stufft.io>
Discussions-To: https://discuss.python.org/t/8651 Status: Accepted Type:
Standards Track Topic: Packaging Content-Type: text/x-rst Created:
10-May-2021 Post-History: 10-May-2021 Resolution:
https://discuss.python.org/t/8651/48

Abstract

This PEP proposes adding an anchor tag to expose the METADATA file from
distributions in the PEP 503 "simple" repository API. A
data-dist-info-metadata attribute is introduced to indicate that the
file from a given distribution can be independently fetched.

Motivation

Package management workflows made popular by recent tooling increase the
need to inspect distribution metadata without intending to install the
distribution, and download multiple distributions of a project to choose
from based on their metadata. This means they end up discarding much
downloaded data, which is inefficient and results in a bad user
experience.

Rationale

Tools have been exploring methods to reduce the download size by
partially downloading wheels with HTTP range requests. This, however,
adds additional run-time requirements to the repository server. It also
still adds additional overhead, since a separate request is needed to
fetch the wheel's file listing to find the correct offset to fetch the
metadata file. It is therefore desired to make the server extract the
metadata file in advance, and serve it as an independent file to avoid
the need to perform additional requests and ZIP inspection.

The metadata file defined by the Core Metadata Specification
[core-metadata] will be served directly by repositories since it
contains the necessary information for common use cases. The metadata
must only be served for standards-compliant distributions such as wheels
[wheel] and sdists [sdist], and must be identical to the distribution's
canonical metadata file, such as a wheel's METADATA file in the
.dist-info directory [dist-info].

An HTML attribute on the distribution file's anchor link is needed to
indicate whether a client is able to choose the separately served
metadata file. The attribute is also used to provide the metadata
content's hash for client-side verification. The attribute's absence
indicates that a separate metadata entry is not available for the
distribution, either because of the distribution's content, or lack of
repository support.

Specification

In a simple repository's project page, each anchor tag pointing to a
distribution MAY have a data-dist-info-metadata attribute. The presence
of the attribute indicates the distribution represented by the anchor
tag MUST contain a Core Metadata file that will not be modified when the
distribution is processed and/or installed.

If a data-dist-info-metadata attribute is present, the repository MUST
serve the distribution's Core Metadata file alongside the distribution
with a .metadata appended to the distribution's file name. For example,
the Core Metadata of a distribution served at
/files/distribution-1.0-py3.none.any.whl would be located at
/files/distribution-1.0-py3.none.any.whl.metadata. This is similar to
how PEP 503 specifies the GPG signature file's location.

The repository SHOULD provide the hash of the Core Metadata file as the
data-dist-info-metadata attribute's value using the syntax
<hashname>=<hashvalue>, where <hashname> is the lower cased name of the
hash function used, and <hashvalue> is the hex encoded digest. The
repository MAY use true as the attribute's value if a hash is
unavailable.

Backwards Compatibility

If an anchor tag lacks the data-dist-info-metadata attribute, tools are
expected to revert to their current behaviour of downloading the
distribution to inspect the metadata.

Older tools not supporting the new data-dist-info-metadata attribute are
expected to ignore the attribute and maintain their current behaviour of
downloading the distribution to inspect the metadata. This is similar to
how prior data- attribute additions expect existing tools to operate.

Rejected Ideas

Put metadata content on the project page

Since tools generally only need dependency information from a
distribution in addition to what's already available on the project
page, it was proposed that repositories may directly include the
information on the project page, like the data-requires-python attribute
specified in PEP 503.

This approach was abandoned since a distribution may contain arbitrarily
long lists of dependencies (including required and optional), and it is
unclear whether including the information for every distribution in a
project would result in net savings since the information for most
distributions generally ends up unneeded. By serving the metadata
separately, performance can be better estimated since data usage will be
more proportional to the number of distributions inspected.

Expose more files in the distribution

It was proposed to provide the entire .dist-info directory as a separate
part, instead of only the metadata file. However, serving multiple files
in one entity through HTTP requires re-archiving them separately after
they are extracted from the original distribution by the repository
server, and there are no current use cases for files other than METADATA
when the distribution itself is not going to be installed.

It should also be noted that the approach taken here does not preclude
other files from being introduced in the future, whether we want to
serve them together or individually.

Explicitly specify the metadata file's URL on the project page

An early version of this draft proposed putting the metadata file's URL
in the data-dist-info-metadata attribute. But people feel it is better
for discoverability to require the repository to serve the metadata file
at a determined location instead. The current approach also has an
additional benefit of making the project page smaller.

References

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

core-metadata

    https://packaging.python.org/specifications/core-metadata/

dist-info

    https://packaging.python.org/specifications/recording-installed-packages/

sdist

    https://packaging.python.org/specifications/source-distribution-format/

wheel

    https://packaging.python.org/specifications/binary-distribution-format/