Following system colour scheme Selected dark colour scheme Selected light colour scheme

Python Enhancement Proposals

PEP 753 – Uniform project URLs in core metadata

Author:
William Woodruff <william at yossarian.net>, Facundo Tuesca <facundo.tuesca at trailofbits.com>
Sponsor:
Barry Warsaw <barry at python.org>
PEP-Delegate:
Paul Moore <p.f.moore at gmail.com>
Discussions-To:
Discourse thread
Status:
Draft
Type:
Standards Track
Topic:
Packaging
Created:
29-Aug-2024
Post-History:
26-Aug-2024, 03-Sep-2024

Table of Contents

Abstract

This PEP recommends two discrete changes to handling of core metadata by indices, such as PyPI:

  • Deprecation of the Home-page and Download-URL fields in favor of their Project-URL equivalents;
  • A set of conventions for normalizing and assigning semantics to Project-URL labels.

Rationale and Motivation

Python’s standard core metadata has gone through many years of revisions, with various standardized milestone versions.

These revisions of the core metadata have introduced various mechanisms for expressing a package’s relationship to external resources, via URLs:

  1. Metadata 1.0 introduced Home-page, a single-use field containing a URL to the distribution’s home page.
    Home-page: https://example.com/sampleproject
    
  2. Metadata 1.1 introduced Download-URL, a complementary single-use field containing a URL suitable for downloading the current distribution.
    Download-URL: https://example.com/sampleproject/sampleproject-1.2.3.tar.gz
    
  3. Metadata 1.2 introduced Project-URL, a multiple-use field containing a label-and-URL pair. Each label is free text conveying the URL’s semantics.
    Project-URL: Homepage, https://example.com/sampleproject
    Project-URL: Download, https://example.com/sampleproject/sampleproject-1.2.3.tar.gz
    Project-URL: Documentation, https://example.com/sampleproject/docs
    

Metadata 2.1, 2.2, and 2.3 leave the behavior of these fields as originally specified.

Because Project-URL allows free text labels and is multiple-use, informal conventions have arisen for representing the values of Home-page and Download-URL within Project-URL instead.

These conventions have seen significant adoption, with PEP 621 explicitly choosing to provide only a project.urls table rather than a project.home-page field. From PEP 621’s rejected ideas:

While the core metadata supports it, having a single field for a project’s URL while also supporting a full table seemed redundant and confusing.

This PEP exists to formalize the informal conventions that have arisen, as well as explicitly document Home-page and Download-URL as deprecated in favor of equivalent Project-URL representations.

Specification

This PEP proposes that Home-page and Download-URL be considered deprecated. This deprecation has implications for both package metadata producers (e.g. build backends and packaging tools) and package indices (e.g. PyPI).

Metadata producers

This PEP stipulates the following for metadata producers:

  • When generating metadata 1.2 or later, producers SHOULD emit only Project-URL, and SHOULD NOT emit Home-page or Download-URL fields.
  • When generating Project-URL equivalents for Home-page and Download-URL, producers SHOULD use the label conventions described below.

These stipulations do not change the optionality of URL fields in core metadata. In other words, producers MAY choose to omit Project-URL entirely at their discretion.

This PEP does not propose the outright removal of support for Home-page or Download-URL. However, see Future Considerations for thoughts on how a new (as of yet unspecified) major core metadata version could complete the deprecation cycle via removal of these deprecated fields.

Package indices

This PEP stipulates the following for package indices:

  • When interpreting a distribution’s metadata of version 1.2 or later (e.g. for rendering on a web page), the index MUST prefer Project-URL fields as a source of URLs over Home-page and Download-URL, even if the latter are explicitly provided.
  • If a distribution’s metadata contains only the Home-page and Download-URL fields, the index MAY choose to ignore those fields and behave as though no URLs were provided in the metadata. In this case, the index SHOULD present an appropriate warning or notification to the uploading party.
    • The mechanism for presenting this warning or notification is not specified, since it will vary by index. By way of example, an index may choose to present a warning in the HTTP response to an upload request, or send an email or other notification to the maintainer(s) of the project.
  • If a distribution’s metadata contains both sets of fields, the index MAY choose to reject the distribution outright. However, this is NOT RECOMMENDED until a future unspecified major metadata version formally removes support for Home-page and Download-URL.
  • Any changes to the interpretation of metadata of version 1.2 or later that result in previously recognized URLs no longer being recognized SHOULD NOT be retroactively applied to previously uploaded packages.

These stipulations do not change the optionality of URL processing by indices. In other words, an index that does not process URLs within uploaded distributions may continue to ignore all URL fields entirely.

Conventions for Project-URL labels

The deprecations proposed above require a formalization of the currently informal relationship between Home-page, Download-URL, and their Project-URL equivalents.

This formalization has two parts:

  1. A set of rules for canonicalizing Project-URL labels;
  2. A set of “well-known” canonical label values that indices may specialize URL presentation for.

Label canonicalization

The core metadata specification stipulates that Project-URL labels are free text, limited to 32 characters.

This PEP proposes adding the concept of a “canonicalized” label to the core metadata specification. Label canonicalization is defined via the following Python function:

import string
def canonicalize_label(label: str) -> str:
    chars_to_remove = string.punctuation + string.whitespace
    removal_map = str.maketrans("", "", chars_to_remove)
    return label.translate(removal_map).lower()

In plain language: a label is canonicalized by deleting all ASCII punctuation and whitespace, and then converting the result to lowercase.

The following table shows examples of labels before (raw) and after canonicalization:

Raw Canonicalized
Homepage homepage
Home-page homepage
Home page homepage
Change_Log changelog
What's New? whatsnew

Metadata producers SHOULD emit the canonicalized form of a user specified label, but MAY choose to emit the un-canonicalized form so long as it adheres to the existing 32 character constraint.

Package indices SHOULD NOT use the canonicalized labels belonging to the set of well-known labels directly as UI elements (instead replacing them with appropriately capitalized text labels). Labels not belonging to the well-known set MAY be used directly as UI elements.

Well-known labels

In addition to the canonicalization rules above, this PEP proposes a fixed (but extensible) set of “well-known” Project-URL labels, as well as equivalent aliases.

The following table lists these labels, in canonical form:

Label Description Aliases
homepage The project’s home page (none)
download A download URL for the current distribution, equivalent to Download-URL (none)
changelog The project’s changelog changes, releasenotes, whatsnew, history
documentation The project’s online documentation docs
issues The project’s bug tracker bugs, issue, bug, tracker, report
sponsor Sponsoring information funding, donate, donation

Packagers and metadata producers MAY choose to use these well-known labels to communicate specific URL intents to package indices and downstreams.

Packagers and metadata producers SHOULD produce the canonicalized version of the well-known labels in package metadata.

Similarly, indices MAY choose to specialize their rendering or presentation of URLs with these labels, e.g. by presenting an appropriate icon or tooltip for each label.

Indices MAY also specialize the rendering or presentation of additional labels or URLs, including (but not limited to), labels that start with a well-known label, and URLs that refer to a known service provider domain (e.g. for documentation hosting or issue tracking).

This PEP recognizes that the list of well-known labels is unlikely to remain static, and that subsequent additions to it should not require the overhead associated with a formal PEP process or new metadata version. As the primary expected use case for this information is to control the way project URLs are displayed on the Python Package Index, this PEP proposes that the list above become a “living” list within PyPI’s documentation (at time of writing, the documentation for influencing PyPI’s URL display can be found here).

Backwards Compatibility

Limited Impact

This PEP is expected to have little to no impact on existing packaging tooling or package indices:

  • Packaging tooling: no changes to the correctness or well-formedness of the core metadata. This PEP proposes deprecations as well as behavioral refinements, but all currently (and historically) produced metadata will continue to be valid per the rules of its respective version.
  • Package indices: indices will continue to expect well-formed core metadata, with no behavioral changes. Indices MAY choose to emit warnings or notifications on the presence of now-deprecated fields, per above.

Future Considerations

This PEP does not stipulate or require any future metadata changes.

However, per Metadata producers and Conventions for Project-URL labels, we identify the following potential future goals for a new major release of the core metadata standards:

  • Outright removal of support for Home-page and Download-URL in the next major core metadata version. If removed, package indices and consumers MUST reject metadata containing these fields when said metadata is of the new major version.
  • Enforcement of label canonicalization. If enforced, package producers MUST emit only canonicalized Project-URL labels when generating distribution metadata, and package indices and consumers MUST reject distributions containing non-canonicalized labels. Note: requiring canonicalization merely restricts labels to lowercase text, and excludes whitespace and punctuation. It does NOT restrict project URLs solely to the use of “well-known” labels.

These potential changes would be backwards incompatible, hence their inclusion only in this section. Acceptance of this PEP does NOT commit any future metadata revision to actually making these changes.

Security Implications

This PEP does not identify any positive or negative security implications associated with deprecating Home-page and Download-URL or with label canonicalization.

How To Teach This

The changes in this PEP should be transparent to the majority of the packaging ecosystem’s userbase; the primary beneficiaries of this PEP’s changes are packaging tooling authors and index maintainers, who will be able to reduce the number of unique URL fields produced and checked.

A small number of package maintainers may observe new warnings or notifications from their index of choice, should the index choose to ignore Home-page and Download-URL as suggested. Similarly, a small number of package maintainers may observe that their index of choice no longer renders their URLs, if only present in the deprecated fields. However, no package maintainers should observe rejected package uploads or other breaking changes to packaging workflows due to this PEP’s proposed changes.

Anybody who observes warnings or changes to the presentation of URLs on indices can be taught about this PEP’s behavior via official packaging resources, such as the Python Packaging User Guide and PyPI’s user documentation, the latter of which already contains an informal description of PyPI’s URL handling behavior.

If this PEP is accepted, the authors of this PEP will coordinate to update and cross-link the resources mentioned above.


Source: https://github.com/python/peps/blob/main/peps/pep-0753.rst

Last modified: 2024-09-10 19:07:31 GMT