PEP: 753 Title: Uniform project URLs in core metadata Author: William
Woodruff <william@yossarian.net>, Facundo Tuesca
<facundo.tuesca@trailofbits.com> Sponsor: Barry Warsaw <barry at
python.org> PEP-Delegate: Paul Moore <p.f.moore at gmail.com>
Discussions-To:
https://discuss.python.org/t/pep-753-uniform-urls-in-core-metadata/62792
Status: Accepted Type: Standards Track Topic: Packaging Created:
29-Aug-2024 Post-History: 26-Aug-2024, 03-Sep-2024 Resolution:
10-Oct-2024

Abstract

This PEP recommends two discrete changes to the processing of core
metadata by indices (such as PyPI) and other core metadata consumers:

-   Deprecation of the Home-page and Download-URL fields in favor of
    their Project-URL equivalents;
-   A set of conventions <conventions-for-labels> for normalizing and
    assigning semantics to Project-URL labels during consumer-side
    metadata processing.

Rationale and Motivation

Python's standard core metadata <packaging:core-metadata> has gone
through many years of revisions, with various standardized milestone
versions.

These revisions of the core metadata have introduced various mechanisms
for expressing a package's relationship to external resources, via URLs:

1.  Metadata 1.0 introduced Home-page, a single-use field containing a
    URL to the distribution's home page.

        Home-page: https://example.com/sampleproject

2.  Metadata 1.1 introduced Download-URL, a complementary single-use
    field containing a URL suitable for downloading the current
    distribution.

        Download-URL: https://example.com/sampleproject/sampleproject-1.2.3.tar.gz

3.  Metadata 1.2 introduced Project-URL, a multiple-use field containing
    a label-and-URL pair. Each label is free text conveying the URL's
    semantics.

        Project-URL: Homepage, https://example.com/sampleproject
        Project-URL: Download, https://example.com/sampleproject/sampleproject-1.2.3.tar.gz
        Project-URL: Documentation, https://example.com/sampleproject/docs

Metadata 2.1, 2.2, and 2.3 leave the behavior of these fields as
originally specified.

Because Project-URL allows free text labels and is multiple-use,
informal conventions have arisen for representing the values of
Home-page and Download-URL within Project-URL instead.

These conventions have seen significant adoption, with PEP 621
explicitly choosing to provide only a project.urls table rather than a
project.home-page field. From PEP 621's rejected ideas:

  While the core metadata supports it, having a single field for a
  project's URL while also supporting a full table seemed redundant and
  confusing.

This PEP exists to formalize the informal conventions that have arisen,
as well as explicitly document Home-page and Download-URL as deprecated
in favor of equivalent Project-URL representations.

Specification

This PEP proposes that Home-page and Download-URL be considered
deprecated. This deprecation has implications for both package metadata
producers (e.g. build backends and packaging tools) and package indices
(e.g. PyPI).

Metadata producers

This PEP stipulates the following for metadata producers:

-   When generating metadata 1.2 or later, producers SHOULD emit only
    Project-URL, and SHOULD NOT emit Home-page or Download-URL fields.

These stipulations do not change the optionality of URL fields in core
metadata. In other words, producers MAY choose to omit Project-URL
entirely at their discretion.

This PEP does not propose the outright removal of support for Home-page
or Download-URL. However, see future-considerations for thoughts on how
a new (as of yet unspecified) major core metadata version could complete
the deprecation cycle via removal of these deprecated fields.

Similarly, this PEP does not propose that metadata producers emit
normalized labels <label-normalization>. Label normalization is
performed solely on the processing and consumption side (i.e. within
indices and other consumers of distribution metadata).

Package indices

This PEP stipulates the following for package indices:

-   When interpreting a distribution's metadata of version 1.2 or later
    (e.g. for rendering on a web page), the index MUST prefer
    Project-URL fields as a source of URLs over Home-page and
    Download-URL, even if the latter are explicitly provided.
-   If a distribution's metadata contains only the Home-page and
    Download-URL fields, the index MAY choose to ignore those fields and
    behave as though no URLs were provided in the metadata. In this
    case, the index SHOULD present an appropriate warning or
    notification to the uploading party.
    -   The mechanism for presenting this warning or notification is not
        specified, since it will vary by index. By way of example, an
        index may choose to present a warning in the HTTP response to an
        upload request, or send an email or other notification to the
        maintainer(s) of the project.
-   If a distribution's metadata contains both sets of fields, the index
    MAY choose to reject the distribution outright. However, this is NOT
    RECOMMENDED until a future unspecified major metadata version
    formally removes support for Home-page and Download-URL.
-   Any changes to the interpretation of metadata of version 1.2 or
    later that result in previously recognized URLs no longer being
    recognized SHOULD NOT be retroactively applied to previously
    uploaded packages.

These stipulations do not change the optionality of URL processing by
indices. In other words, an index that does not process URLs within
uploaded distributions may continue to ignore all URL fields entirely.

Similarly, these stipulations do not imply that the index should persist
any normalizations that it performs to a distribution's metadata. In
other words, this PEP stipulates label normalization for the purpose of
user presentation, not for the purpose of modifying an uploaded
distribution or its "sidecar" PEP 658 metadata file.

Conventions for Project-URL labels

The deprecations proposed above require a formalization of the currently
informal relationship between Home-page, Download-URL, and their
Project-URL equivalents.

This formalization has two parts:

1.  A set of rules for normalizing Project-URL labels during index-side
    processing;
2.  A set of "well-known" normalized label values that indices may
    specialize URL presentation for.

Label normalization

The core metadata specification stipulates that Project-URL labels are
free text, limited to 32 characters.

This PEP proposes adding the concept of a "normalized" label to the core
metadata specification. Label normalization is defined via the following
Python function:

    import string
    def normalize_label(label: str) -> str:
        chars_to_remove = string.punctuation + string.whitespace
        removal_map = str.maketrans("", "", chars_to_remove)
        return label.translate(removal_map).lower()

In plain language: a label is normalized by deleting all ASCII
punctuation and whitespace, and then converting the result to lowercase.

The following table shows examples of labels before (raw) and after
normalization:

  Raw           Normalized
  ------------- ------------
  Homepage      homepage
  Home-page     homepage
  Home page     homepage
  Change_Log    changelog
  What's New?   whatsnew

When processing distribution metadata, package indices SHOULD perform
label normalization to determine if a given label is
well known <well-known> for subsequent special processing. Labels that
are not well-known MUST be processed in their un-normalized form.

Normalization does not change pre-existing semantics around duplicated
Project-URL labels. In other words, normalization may result in
duplicate labels in the project's metadata, but only in the same manner
that was already permitted (since the
core metadata specification <packaging:core-metadata> does not stipulate
that labels are unique).

Excerpted examples of normalized metadata fields are provided in
Appendix A <appendix-a>.

Well-known labels

In addition to the normalization rules above, this PEP proposes a fixed
(but extensible) set of "well-known" Project-URL labels, as well as
aliases and human-readable equivalents.

The following table lists these labels, in normalized form:

  Label (Human-readable equivalent)   Description                                                               Aliases
  ----------------------------------- ------------------------------------------------------------------------- ------------------------------------------------
  homepage (Homepage)                 The project's home page                                                   (none)
  source (Source Code)                The project's hosted source code or repository                            repository, sourcecode, github
  download (Download)                 A download URL for the current distribution, equivalent to Download-URL   (none)
  changelog (Changelog)               The project's comprehensive changelog                                     changes, whatsnew, history
  releasenotes (Release Notes)        The project's curated release notes                                       (none)
  documentation (Documentation)       The project's online documentation                                        docs
  issues (Issue Tracker)              The project's bug tracker                                                 bugs, issue, tracker, issuetracker, bugtracker
  funding (Funding)                   Funding Information                                                       sponsor, donate, donation

Indices MAY choose to use the human-readable equivalents suggested above
in their UI elements, if appropriate. Alternatively, indices MAY choose
their own appropriate human-readable equivalents for UI elements.

Packagers and metadata producers MAY choose to use any label that
normalizes to these labels (or their aliases) to communicate specific
URL intents to package indices and downstreams.

Similarly, indices MAY choose to specialize their rendering or
presentation of URLs with these labels, e.g. by presenting an
appropriate icon or tooltip for each label.

Indices MAY also specialize the rendering or presentation of additional
labels or URLs, including (but not limited to), labels that start with a
well-known label, and URLs that refer to a known service provider domain
(e.g. for documentation hosting or issue tracking).

This PEP recognizes that the list of well-known labels is unlikely to
remain static, and that subsequent additions to it should not require
the overhead associated with a formal PEP process or new metadata
version. As such, this PEP proposes that the list above become a
"living" list within the
PyPA specifications <packaging:packaging-specifications>.

Backwards Compatibility

Limited Impact

This PEP is expected to have little to no impact on existing packaging
tooling or package indices:

-   Packaging tooling: no changes to the correctness or well-formedness
    of the core metadata. This PEP proposes deprecations as well as
    behavioral refinements, but all currently (and historically)
    produced metadata will continue to be valid per the rules of its
    respective version.
-   Package indices: indices will continue to expect well-formed core
    metadata, with no behavioral changes. Indices MAY choose to emit
    warnings or notifications on the presence of now-deprecated fields,
    per above <package-indices>.

Future Considerations

This PEP does not stipulate or require any future metadata changes.

However, per metadata-producers and conventions-for-labels, we identify
the following potential future goals for a new major release of the core
metadata standards:

-   Outright removal of support for Home-page and Download-URL in the
    next major core metadata version. If removed, package indices and
    consumers MUST reject metadata containing these fields when said
    metadata is of the new major version.
-   Enforcement of label normalization. If enforced, package producers
    MUST emit only normalized Project-URL labels when generating
    distribution metadata, and package indices and consumers MUST reject
    distributions containing non-normalized labels. Note: requiring
    normalization merely restricts labels to lowercase text, and
    excludes whitespace and punctuation. It does NOT restrict project
    URLs solely to the use of "well-known" labels.

These potential changes would be backwards incompatible, hence their
inclusion only in this section. Acceptance of this PEP does NOT commit
any future metadata revision to actually making these changes.

Security Implications

This PEP does not identify any positive or negative security
implications associated with deprecating Home-page and Download-URL or
with label normalization.

How To Teach This

The changes in this PEP should be transparent to the majority of the
packaging ecosystem's userbase; the primary beneficiaries of this PEP's
changes are packaging tooling authors and index maintainers, who will be
able to reduce the number of unique URL fields produced and checked.

A small number of package maintainers may observe new warnings or
notifications from their index of choice, should the index choose to
ignore Home-page and Download-URL as suggested. Similarly, a small
number of package maintainers may observe that their index of choice no
longer renders their URLs, if only present in the deprecated fields.
However, no package maintainers should observe rejected package uploads
or other breaking changes to packaging workflows due to this PEP's
proposed changes.

Anybody who observes warnings or changes to the presentation of URLs on
indices can be taught about this PEP's behavior via official packaging
resources, such as the Python Packaging User Guide <packaging> and
PyPI's user documentation, the latter of which already contains an
informal description of PyPI's URL handling behavior.

If this PEP is accepted, the authors of this PEP will coordinate to
update and cross-link the resources mentioned above.

Appendix A: Label normalization examples

This appendix provides an illustrative excerpt of a distribution's
metadata, before and after index-side processing:

Before:

    Project-URL: Home-page, https://example.com
    Project-URL: Homepage, https://another.example.com
    Project-URL: Source, https://github.com/example/example
    Project-URL: GitHub, https://github.com/example/example
    Project-URL: Another Service, https://custom.example.com

After:

    Project-URL: homepage, https://example.com
    Project-URL: homepage, https://another.example.com
    Project-URL: source, https://github.com/example/example
    Project-URL: github, https://github.com/example/example
    Project-URL: Another Service, https://custom.example.com

In particular, observe:

-   Normalized duplicates are preserved (both Home-page and Homepage
    become homepage);
-   Source and GitHub are both normalized into their respective forms,
    but github is not transformed into source.
-   Another Service is not normalized, since its normal form
    (anotherservice) is not a well-known label <well-known>.

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.