PEP: 610 Title: Recording the Direct URL Origin of installed
distributions Author: Stéphane Bidoul <stephane.bidoul@gmail.com>, Chris
Jerdonek <chris.jerdonek@gmail.com> Sponsor: Alyssa Coghlan
<ncoghlan@gmail.com> BDFL-Delegate: Pradyun Gedam <pradyunsg@gmail.com>
Discussions-To:
https://discuss.python.org/t/recording-the-source-url-of-an-installed-distribution/1535
Status: Final Type: Standards Track Topic: Packaging Content-Type:
text/x-rst Created: 21-Apr-2019 Post-History: Resolution:
https://discuss.python.org/t/1535/56

packaging:direct-url

Abstract

Following PEP 440, a distribution can be identified by a name and either
a version, or a direct URL reference (see
PEP440 Direct References <440#direct-references>). After installation,
the name and version are captured in the project metadata, but currently
there is no way to obtain details of the URL used when the distribution
was identified by a direct URL reference.

This proposal defines additional metadata, to be added to the installed
distribution by the installation front end, which records the Direct URL
Origin for use by consumers which introspect the database of installed
packages (see PEP 376).

Motivation

The original motivation of this PEP was to permit tools with a "freeze"
operation allowing a Python environment to be recreated to work in a
broader range of situations.

Specifically, the PEP originated from the desire to address pip issue
#609: i.e. improving the behavior of pip freeze in the presence of
distributions installed from direct URL references. It follows a thread
on discuss.python.org about the best course of action to implement it.

Installation from direct URL references

Python installers such as pip are capable of downloading and installing
distributions from package indexes. They are also capable of downloading
and installing source code from requirements specifying arbitrary URLs
of source archives and Version Control Systems (VCS) repositories, as
standardized in PEP440 Direct References <440#direct-references>.

In other words, two relevant installation modes exist.

1.  the package to install is specified as a name and version specifier:

  In this case, the installer looks in a package index (or optionally
  using --find-links in the case of pip) to find the distribution to
  install.

2.  The package to install is specified as a direct URL reference:

  In this case, the installer downloads whatever is specified by the URL
  (typically a wheel, a source archive or a VCS repository) and installs
  it.

  In this mode, installers typically download the source code in a
  temporary directory, invoke the PEP 517 build backend to produce a
  wheel if needed, install the wheel, and delete the temporary
  directory.

  After installation, no trace of the URL the user requested to download
  the package is left on the user system.

Freezing an environment

Pip also sports a command named pip freeze which examines the Database
of Installed Python Distributions to generate a list of requirements.
The main goal of this command is to help users generating a list of
requirements that will later allow the re-installation the same
environment with the highest possible fidelity.

As of pip version 19.3, the pip freeze command outputs a name==version
line for each installed distribution (except for editable installs). To
achieve the goal of reinstalling the same environment, this requires the
(name, version) tuple to refer to an immutable version of the
distribution. The immutability is guaranteed by package indexes such as
Warehouse. The package index to use is typically known from
environmental or command line parameters of the installer.

This freeze mechanism therefore works fine for installation mode 1 (i.e.
when the package to install was specified as a name plus version
specifier).

For installation mode 2, i.e. when the package to install was specified
as a direct URL reference, the name==version tuple is obviously not
sufficient to reinstall the same distribution and users of the freeze
command expect it to output the URL that was originally requested.

The reasoning above is equally applicable to tools, other than
pip freeze, that would attempt to generate a Pipfile.lock or any other
similar format from the Database of Installed Python Distributions.
Unless specified otherwise, "freeze" is used in this document as a
generic term for such an operation.

The importance of installing from (VCS) URLs for application integrators

For an application integrator, it is important to be able to reliably
install and freeze unreleased version of python distributions. For
instance when a developer needs to deploy an unreleased patched version
of a dependency, it is common to install the dependency directly from a
VCS branch that has the patch, while waiting for the maintainer to
release an updated version.

In such cases, it is important for "freeze" to pin the exact VCS
reference (commit-hash if available) that was installed, in order to
create reproducible builds with the highest possible fidelity.

Additional origin metadata available for VCS URLs

For VCS URLs, there is additional origin information available only at
install time useful for introspection and certain workflows. For
example, when installing a revision from a VCS URL, a tool can determine
if the revision corresponds to a branch, tag or (in the case of Git) a
ref. This information can be used when introspecting the Database of
Installed Distributions to communicate to users more information about
what version was installed (e.g. whether a branch or tag was installed
and, if so, the name of the branch or tag). This also permits one to
know whether a PEP 440 direct reference URL can be constructed using the
tag form, as only tags have the semantics of immutability.

In cases where the revision is mutable (e.g. branches and Git refs),
knowing this information enables workflows where users can e.g. update
to the latest version of a branch they are tracking, or update to the
latest version of a pull request they are reviewing locally. In
contrast, when the revision is a tag, tools can know in advance (e.g.
without network calls) that no update is needed.

As with the URL itself, if this information isn't recorded at install
time when the VCS repository is available, it would otherwise be lost.

Note about "editable" installs

The editable installation mode of pip roughly lets a user insert a local
directory in sys.path for development purpose. This mode is somewhat
abused to work around the fact that a non-editable install from a VCS
URL loses track of the origin after installation. Indeed, editable
installs implicitly record the VCS origin in the checkout directory, so
the information can be recovered when running "freeze".

The use of this workaround, although useful, is fragile, creates
confusion about the purpose of the editable mode, and works only when
the distribution can be installed with setuptools (i.e. it is not usable
with other PEP 517 build backends).

When this PEP is implemented, it will not be necessary anymore to use
editable installs for the purpose of making pip freeze work correctly
with VCS references.

Rationale

This PEP specifies a new direct_url.json metadata file in the .dist-info
directory of an installed distribution.

The fields specified are sufficient to reproduce the source archive and
VCS URLs supported by pip. They are also sufficient to reproduce
PEP440 Direct References <440#direct-references>, as well as Pipfile and
Pipfile.lock entries. Finally, they are sufficient to record the branch,
tag, and/or Git ref origin of the installed version that is already
available for editable installs by virtue of a VCS checkout being
present.

Since at least three different ways already exist to encode this type of
information, this PEP uses a dictionary format, so as not to make any
assumption on how a direct URL reference must ultimately be encoded in a
requirement or lockfile. See also the Alternatives section below for
more discussion about this choice.

Information has been taken from Ruby's bundler manual to verify it has
similar capabilities and inform the selection and naming of fields in
this specifications.

The JSON format allows for the addition of additional fields in the
future.

Specification

This PEP specifies a direct_url.json file in the .dist-info directory of
an installed distribution, to record the Direct URL Origin of the
distribution.

The canonical source for the name and semantics of this metadata file is
the Recording the Direct URL Origin of installed distributions document.

This file MUST be created by installers when installing a distribution
from a requirement specifying a direct URL reference (including a VCS
URL).

This file MUST NOT be created when installing a distribution from an
other type of requirement (i.e. name plus version specifier).

This JSON file MUST be a dictionary, compliant with 8259 and UTF-8
encoded.

If present, it MUST contain at least two fields. The first one is url,
with type string. Depending on what url refers to, the second field MUST
be one of vcs_info (if url is a VCS reference), archive_info (if url is
a source archives or a wheel), or dir_info (if url is a local
directory). These info fields have a (possibly empty) subdictionary as
value, with the possible keys defined below.

url MUST be stripped of any sensitive authentication information, for
security reasons.

The user:password section of the URL MAY however be composed of
environment variables, matching the following regular expression:

    \$\{[A-Za-z0-9-_]+\}(:\$\{[A-Za-z0-9-_]+\})?

Additionally, the user:password section of the URL MAY be a well-known,
non security sensitive string. A typical example is git in the case of
an URL such as ssh://git@gitlab.com.

When url refers to a VCS repository, the vcs_info key MUST be present as
a dictionary with the following keys:

-   A vcs key (type string) MUST be present, containing the name of the
    VCS (i.e. one of git, hg, bzr, svn). Other VCS's SHOULD be
    registered by writing a PEP to amend this specification. The url
    value MUST be compatible with the corresponding VCS, so an installer
    can hand it off without transformation to a checkout/download
    command of the VCS.
-   A requested_revision key (type string) MAY be present naming a
    branch/tag/ref/commit/revision/etc (in a format compatible with the
    VCS) to install.
-   A commit_id key (type string) MUST be present, containing the exact
    commit/revision number that was installed. If the VCS supports
    commit-hash based revision identifiers, such commit-hash MUST be
    used as commit_id in order to reference the immutable version of the
    source code that was installed.
-   If the installer could discover additional information about the
    requested revision, it MAY add a resolved_revision and/or
    resolved_revision_type field. If no revision was provided in the
    requested URL, resolved_revision MAY contain the default branch that
    was installed, and resolved_revision_type will be branch. If the
    installer determines that requested_revision was a tag, it MAY add
    resolved_revision_type with value tag.

When url refers to a source archive or a wheel, the archive_info key
MUST be present as a dictionary with the following key:

-   A hash key (type string) SHOULD be present, with value
    <hash-algorithm>=<expected-hash>. It is RECOMMENDED that only hashes
    which are unconditionally provided by the latest version of the
    standard library's hashlib module be used for source archive hashes.
    At time of writing, that list consists of 'md5', 'sha1', 'sha224',
    'sha256', 'sha384', and 'sha512'.

When url refers to a local directory, the dir_info key MUST be present
as a dictionary with the following key:

-   editable (type: boolean): true if the distribution was installed in
    editable mode, false otherwise. If absent, default to false.

When url refers to a local directory, it MUST have the file scheme and
be compliant with 8089. In particular, the path component must be
absolute. Symbolic links SHOULD be preserved when making relative paths
absolute.

Note

When the requested URL has the file:// scheme and points to a local
directory that happens to contain a VCS checkout, installers MUST NOT
attempt to infer any VCS information and therefore MUST NOT output any
VCS related information (such as vcs_info) in direct_url.json.

A top-level subdirectory field MAY be present containing a directory
path, relative to the root of the VCS repository, source archive or
local directory, to specify where pyproject.toml or setup.py is located.

Note

As a general rule, installers should as much as possible preserve the
information that was provided in the requested URL when generating
direct_url.json. For example, user:password environment variables should
be preserved and requested_revision should reflect the revision that was
provided in the requested URL as faithfully as possible. This
information is however enriched with more precise data, such as
commit_id.

Registered VCS

This section lists the registered VCS's; expanded, VCS-specific
information on how to use the vcs, requested_revision, and other fields
of vcs_info; and in some cases additional VCS-specific fields. Tools MAY
support other VCS's although it is RECOMMENDED to register them by
writing a PEP to amend this specification. The vcs field SHOULD be the
command name (lowercased). Additional fields that would be necessary to
support such VCS SHOULD be prefixed with the VCS command name.

Git

Home page

  https://git-scm.com/

vcs command

  git

vcs field

  git

requested_revision field

  A tag name, branch name, Git ref, commit hash, shortened commit hash,
  or other commit-ish.

commit_id field

  A commit hash (40 hexadecimal characters sha1).

Note

Installers can use the git show-ref and git symbolic-ref commands to
determine if the requested_revision corresponds to a Git ref. In turn, a
ref beginning with refs/tags/ corresponds to a tag, and a ref beginning
with refs/remotes/origin/ after cloning corresponds to a branch.

Mercurial

Home page

  https://www.mercurial-scm.org/

vcs command

  hg

vcs field

  hg

requested_revision field

  A tag name, branch name, changeset ID, shortened changeset ID.

commit_id field

  A changeset ID (40 hexadecimal characters).

Bazaar

Home page

  https://bazaar.canonical.com/

vcs command

  bzr

vcs field

  bzr

requested_revision field

  A tag name, branch name, revision id.

commit_id field

  A revision id.

Subversion

Home page

  https://subversion.apache.org/

vcs command

  svn

vcs field

  svn

requested_revision field

  requested_revision must be compatible with svn checkout --revision
  option. In Subversion, branch or tag is part of url.

commit_id field

  Since Subversion does not support globally unique identifiers, this
  field is the Subversion revision number in the corresponding
  repository.

Examples

Example direct_url.json

Source archive:

    {
        "url": "https://github.com/pypa/pip/archive/1.3.1.zip",
        "archive_info": {
            "hash": "sha256=2dc6b5a470a1bde68946f263f1af1515a2574a150a30d6ce02c6ff742fcc0db8"
        }
    }

Git URL with tag and commit-hash:

    {
        "url": "https://github.com/pypa/pip.git",
        "vcs_info": {
            "vcs": "git",
            "requested_revision": "1.3.1",
            "resolved_revision_type": "tag",
            "commit_id": "7921be1537eac1e97bc40179a57f0349c2aee67d"
        }
    }

Local directory:

    {
        "url": "file:///home/user/project",
        "dir_info": {}
    }

Local directory installed in editable mode:

    {
        "url": "file:///home/user/project",
        "dir_info": {
            "editable": true
        }
    }

Example pip commands and their effect on direct_url.json

Commands that generate a direct_url.json:

-   pip install https://example.com/app-1.0.tgz
-   pip install https://example.com/app-1.0.whl
-   pip install
    "git+https://example.com/repo/app.git#egg=app&subdirectory=setup"
-   pip install ./app
-   pip install file:///home/user/app
-   pip install --editable
    "git+https://example.com/repo/app.git#egg=app&subdirectory=setup"
    (in which case, url will be the local directory where the git
    repository has been cloned to, and dir_info will be present with
    "editable": true and no vcs_info will be set)
-   pip install -e ./app

Commands that do not generate a direct_url.json

-   pip install app
-   pip install app --no-index --find-links https://example.com/

Use cases

"Freezing" an environment

  Tools, such as pip freeze, which generate requirements from the
  Database of Installed Python Distributions SHOULD exploit
  direct_url.json if it is present, and give it priority over the
  Version metadata in order to generate a higher fidelity output. In the
  presence of a vcs direct URL reference, the commit_id field SHOULD be
  used in priority in order to provide the highest possible fidelity to
  the originally installed version. If supported by their requirement
  format, tools are encouraged also to output the tag value if present,
  as it has immutable semantics. Tools MAY choose another approach,
  depending on the needs of their users.

  Note the initial iteration of this PEP does not attempt to make
  environments that include editable installs or installs from local
  directories reproducible, but it does attempt to make them readily
  identifiable. By locating the local project directory via the url and
  dir_info fields of this specification, tools can implement any
  strategy that fits their use cases.

Backwards Compatibility

Since this PEP specifies a new file in the .dist-info directory, there
are no backwards compatibility implications.

Alternatives

PEP 426 source_url

The now withdrawn PEP 426 specifies a source_url metadata entry. It is
also implemented in distlib.

It was intended for a slightly different purpose, for use in sdists.

This format lacks support for the subdirectory option of pip requirement
URLs. The same limitation is present in
PEP440 Direct References <440#direct-references>.

It also lacks explicit support for environment variables in the
user:password part of URLs.

The introduction of a key/value extensibility mechanism and support for
environment variables for user:password in PEP 440, would be necessary
for use in this PEP.

revision vs ref

The requested_revision key was retained over requested_ref as it is a
more generic term across various VCS and ref has a specific meaning for
git.

References

Acknowledgements

Various people helped make this PEP a reality. Paul F. Moore provided
the essence of the abstract. Alyssa Coghlan suggested the direct_url
name.

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End: