PEP: 625 Title: Filename of a Source Distribution Author: Tzu-ping Chung
<uranusjr@gmail.com>, Paul Moore <p.f.moore@gmail.com> PEP-Delegate:
Pradyun Gedam <pradyunsg@gmail.com> Discussions-To:
https://discuss.python.org/t/draft-pep-file-name-of-a-source-distribution/4686
Status: Accepted Type: Standards Track Topic: Packaging Content-Type:
text/x-rst Created: 08-Jul-2020 Post-History: 08-Jul-2020 Resolution:
https://discuss.python.org/t/pep-625-file-name-of-a-source-distribution/4686/159

Abstract

This PEP describes a standard naming scheme for a Source Distribution,
also known as an sdist. An sdist is distinct from an arbitrary archive
file containing source code of Python packages, and can be used to
communicate information about the distribution to packaging tools.

A standard sdist specified here is a gzipped tar file with a specially
formatted filename and the usual .tar.gz suffix. This PEP does not
specify the contents of the tarball, as that is covered in other
specifications.

Motivation

An sdist is a Python package distribution that contains "source code" of
the Python package, and requires a build step to be turned into a wheel
on installation. This format is often considered as an unbuilt
counterpart of a PEP 427 wheel, and given special treatments in various
parts of the packaging ecosystem.

The content of an sdist is specified in PEP 517 and PEP 643, but
currently the filename of the sdist is incompletely specified, meaning
that consumers of the format must download and process the sdist to
confirm the name and version of the distribution included within.

Installers currently rely on heuristics to infer the name and/or version
from the filename, to help the installation process. pip, for example,
parses the filename of an sdist from a PEP 503 index, to obtain the
distribution's project name and version for dependency resolution
purposes. But due to the lack of specification, the installer does not
have any guarantee as to the correctness of the inferred data, and must
verify it at some point by locally building the distribution metadata.

This build step is awkward for a certain class of operations, when the
user does not expect the build process to occur. pypa/pip#8387 describes
an example. The command pip download --no-deps --no-binary=numpy numpy
is expected to only download an sdist for numpy, since we do not need to
check for dependencies, and both the name and version are available by
introspecting the downloaded filename. pip, however, cannot assume the
downloaded archive follows the convention, and must build and check the
metadata. For a PEP 518 project, this means running the
prepare_metadata_for_build_wheel hook specified in PEP 517, which incurs
significant overhead.

Rationale

By creating a special filename scheme for the sdist format, this PEP
frees up tools from the time-consuming metadata verification step when
they only need the metadata available in the filename.

This PEP also serves as the formal specification to the long-standing
filename convention used by the current sdist implementations. The
filename contains the distribution name and version, to aid tools
identifying a distribution without needing to download, unarchive the
file, and perform costly metadata generation for introspection, if all
the information they need is available in the filename.

Specification

The name of an sdist should be {distribution}-{version}.tar.gz.

-   distribution is the name of the distribution as defined in PEP 345,
    and normalised as described in the wheel spec e.g. 'pip',
    'flit_core'.
-   version is the version of the distribution as defined in PEP 440,
    e.g. 20.2, and normalised according to the rules in that PEP.

An sdist must be a gzipped tar archive in pax format, that is able to be
extracted by the standard library tarfile module with the open flag
'r:gz'.

Code that produces an sdist file MUST give the file a name that matches
this specification. The specification of the build_sdist hook from PEP
517 is extended to require this naming convention.

Code that processes sdist files MAY determine the distribution name and
version by simply parsing the filename, and is not required to verify
that information by generating or reading the metadata from the sdist
contents.

Conforming sdist files can be recognised by the presence of the .tar.gz
suffix and a single hyphen in the filename. Note that some legacy files
may also match these criteria, but this is not expected to be an issue
in practice. See the "Backwards Compatibility" section of this document
for more details.

Backwards Compatibility

The new filename scheme is a subset of the current informal naming
convention for sdist files, so tools that create or publish files
conforming to this standard will be readable by older tools that only
understand the previous naming conventions.

Tools that consume sdist filenames would technically not be able to
determine whether a file is using the new standard or a legacy form.
However, a review of the filenames on PyPI determined that 37% of files
are obviously legacy (because they contain multiple or no hyphens) and
of the remainder, parsing according to this PEP gives the correct answer
in all but 0.004% of cases.

Currently, tools that consume sdists should, if they are to be fully
correct, treat the name and version parsed from the filename as
provisional, and verify them by downloading the file and generating the
actual metadata (or reading it, if the sdist conforms to PEP 643). Tools
supporting this specification can treat the name and version from the
filename as definitive. In theory, this could risk mistakes if a legacy
filename is assumed to conform to this PEP, but in practice the chance
of this appears to be vanishingly small.

Rejected Ideas

Rely on the specification for sdist metadata

Since this PEP was first written, PEP 643 has been accepted, defining a
trustworthy, standard sdist metadata format. This allows distribution
metadata (and in particular name and version) to be determined
statically.

This is not considered sufficient, however, as in a number of
significant cases (for example, reading filenames from a package index)
the application only has access to the filename, and reading metadata
would involve a potentially costly download.

Use a dedicated file extension

The original version of this PEP proposed a filename of
{distribution}-{version}.sdist. This has the advantage of being
explicit, as well as allowing a future change to the storage format
without needing a further change of the file naming convention.

However, there are significant compatibility issues with a new
extension. Index servers may currently disallow unknown extensions, and
if we introduced a new one, it is not clear how to handle cases like a
legacy index trying to mirror an index that hosts new-style sdists. Is
it acceptable to only partially mirror, omitting sdists for newer
versions of projects? Also, build backends that produce the new format
would be incompaible with index servers that only accept the old format,
and as there is often no way for a user to request an older version of a
backend when doing a build, this could make it impossible to build and
upload sdists.

Augment a currently common sdist naming scheme

A scheme {distribution}-{version}.sdist.tar.gz was raised during the
initial discussion. This was abandoned due to backwards compatibility
issues with currently available installation tools. pip 20.1, for
example, would parse distribution-1.0.sdist.tar.gz as project
distribution with version 1.0.sdist. This would cause the sdist to be
downloaded, but fail to install due to inconsistent metadata.

The main advantage of this proposal was that it is easier for tools to
recognise the new-style naming. But this is not a particularly
significant benefit, given that all sdists with a single hyphen in the
name are parsed the same way under the old and new rules.

Open Issues

The contents of an sdist are required to contain a single top-level
directory named {name}-{version}. Currently no normalisation rules are
required for the components of this name. Should this PEP require that
the same normalisation rules are applied here as for the filename? Note
that in practice, it is likely that tools will create the two names
using the same code, so normalisation is likely to happen naturally,
even if it is not explicitly required.

References

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End: