PEP: 740 Title: Index support for digital attestations Author: William
Woodruff <william@yossarian.net>, Facundo Tuesca
<facundo.tuesca@trailofbits.com>, Dustin Ingram <di@python.org> Sponsor:
Donald Stufft <donald@stufft.io> PEP-Delegate: Donald Stufft
<donald@stufft.io> Discussions-To:
https://discuss.python.org/t/pep-740-index-support-for-digital-attestations/44498
Status: Provisional Type: Standards Track Topic: Packaging Created:
08-Jan-2024 Post-History: 02-Jan-2024, 29-Jan-2024 Resolution:
https://discuss.python.org/t/pep-740-index-support-for-digital-attestations/44498/26

Abstract

This PEP proposes a collection of changes related to the upload and
distribution of digitally signed attestations and metadata used to
verify them on a Python package repository, such as PyPI.

These changes have two subcomponents:

-   Changes to the currently unstandardized PyPI upload API, allowing
    clients to upload digital attestations as
    attestation objects <attestation-object>;
-   Changes to the
    HTML and JSON "simple" APIs <packaging:simple-repository-api>,
    allowing clients to retrieve both digital attestations and Trusted
    Publishing metadata for individual release files as
    provenance objects <provenance-object>.

This PEP does not make a policy recommendation around mandatory digital
attestations on release uploads or their subsequent verification by
installing clients like pip.

Rationale and Motivation

Desire for digital signatures on Python packages has been repeatedly
expressed by both package maintainers and downstream users:

-   Maintainers wish to demonstrate the integrity and authenticity of
    their package uploads;
-   Individual downstream users wish to verify package integrity and
    authenticity without placing additional trust in their index's
    honesty;
-   "Bulk" downstream users (such as Operating System distributions)
    wish to perform similar verifications and potentially re-expose or
    countersign for their own downstream packaging ecosystems.

This proposal seeks to accommodate each of the above use cases.

Additionally, this proposal identifies the following motivations:

-   Verifiable provenance for Python package distributions: many Python
    packages currently contain unauthenticated provenance metadata, such
    as URLs for source hosts. A cryptographic attestation format could
    enable strong authenticated links between these packages and their
    source hosts, allowing both the index and downstream users to
    cryptographically verify that a package originates from its claimed
    source repository.

-   Raising attacker requirements: an attacker who seeks to take over a
    Python package can be described along sophistication
    (unsophisticated to sophisticated) and targeting dimensions
    (opportunistic to targeted).

    Digital attestations impose additional sophistication requirements:
    the attacker must be sufficiently sophisticated to access private
    signing material (or signing identities).

-   Index verifiability: in the status quo, the only attestation
    provided by the index is an optional PGP signature per release file
    (see PGP signatures <pgp-signatures>). These signatures are not (and
    cannot be) checked by the index either for well-formedness or for
    validity, since the index has no mechanism for identifying the right
    public key for the signature. This PEP overcomes this limitation by
    ensuring that provenance objects <provenance-object> contain all of
    the metadata needed by the index to verify an attestation's
    validity.

This PEP proposes a generic attestation format, containing an
attestation statement for signature generation <payload-and-signature-generation>,
with the expectation that index providers adopt the format with a
suitable source of identity for signature verification, such as Trusted
Publishing.

Design Considerations

This PEP identifies the following design considerations when evaluating
both its own proposed changes and previous work in the same or adjacent
areas of Python packaging:

1.  Index accessibility: digital attestations for Python packages are
    ideally retrievable directly from the index itself, as "detached"
    resources.

    This both simplifies some compatibility concerns (by avoiding the
    need to modify the distribution formats themselves) and also
    simplifies the behavior of potential installing clients (by allowing
    them to retrieve each attestation before its corresponding package
    without needing to do streaming decompression).

2.  Verification by the index itself: in addition to enabling
    verification by installing clients, each digital attestation is
    ideally verifiable in some form by the index itself.

    This both increases the overall quality of attestations uploaded to
    the index (preventing, for example, users from accidentally
    uploading incorrect or invalid attestations) and also enables UI and
    UX refinements on the index itself (such as a "provenance" view for
    each uploaded package).

3.  General applicability: digital attestations should be applicable to
    any and every package uploaded to the index, regardless of its
    format (sdist or wheel) or interior contents.

4.  Metadata support: this PEP refers to "digital attestations" rather
    than just "digital signatures" to emphasize the ideal presence of
    additional metadata within the cryptographic envelope.

    For example, to prevent domain separation between a distribution's
    name and its contents, this PEP uses 'Statements' from the in-toto
    project to bind the distribution's contents (via SHA-256 digest) to
    its filename.

Previous Work

PGP signatures

PyPI and other indices have historically supported PGP signatures on
uploaded distributions. These could be supplied during upload, and could
be retrieved by installing clients via the data-gpg-sig attribute in the
PEP 503 API, the gpg-sig key on the PEP 691 API, or via an adjacent
.asc-suffixed URL.

PGP signature uploads have been disabled on PyPI since May 2023, after
an investigation determined that the majority of signatures (which,
themselves, constituted a tiny percentage of overall uploads) could not
be associated with a public key or otherwise meaningfully verified.

In their previously supported form on PyPI, PGP signatures satisfied
considerations (1) and (3) above but not (2) (owing to the need for
external keyservers and key distribution) or (4) (due to PGP signatures
typically being constructed over just an input file, without any
associated signed metadata).

Wheel signatures

PEP 427 (and its
living PyPA counterpart <packaging:binary-distribution-format>) specify
the wheel format <packaging:Wheel>.

This format includes accommodations for digital signatures embedded
directly into the wheel, in either JWS or S/MIME format. These
signatures are specified over a PEP 376 RECORD, which is modified to
include a cryptographic digest for each recorded file in the wheel.

While wheel signatures are fully specified, they do not appear to be
broadly used; the official wheel tooling deprecated signature generation
and verification support in 0.32.0, which was released in 2018.

Additionally, wheel signatures do not satisfy any of the above
considerations (due to the "attached" nature of the signatures,
non-verifiability on the index itself, and support for wheels only).

Specification

Upload endpoint changes

The current upload API is not standardized. However, we propose the
following changes to it:

-   In addition to the current top-level content and gpg_signature
    fields, the index SHALL accept attestations as an additional
    multipart form field.
-   The new attestations field SHALL be a JSON array.
-   The attestations array SHALL have one or more items, each a JSON
    object representing an individual attestation.
-   Each attestation object MUST be verifiable by the index. If the
    index fails to verify any attestation in attestations, it MUST
    reject the upload. The format of attestation objects is defined
    under attestation-object and the process for verifying attestations
    is defined under attestation-verification.

Index changes

Simple Index

The following changes are made to the
simple repository API <packaging:simple-repository-api-base>:

-   When an uploaded file has one or more attestations, the index MAY
    provide a provenance file containing attestations associated with a
    given distribution. The format of the provenance file SHALL be a
    JSON-encoded provenance object <provenance-object>, which SHALL
    contain the file's attestations.

    The location of the provenance file is signaled by the index via the
    data-provenance attribute.

-   When a provenance file is present, the index MAY include a
    data-provenance attribute on its file link. The value of the
    data-provenance attribute SHALL be a fully qualified URL, signaling
    the the file's provenance can be found at that URL. This URL MUST
    represent a secure origin.

    The following table provides examples of release file URLs,
    data-provenance values, and their resulting provenance file URLs.

      File URL                                         data-provenance                                                   Provenance URL
      ------------------------------------------------ ----------------------------------------------------------------- -----------------------------------------------------------------
      https://example.com/sampleproject-1.2.3.tar.gz   https://example.com/sampleproject-1.2.3.tar.gz.provenance         https://example.com/sampleproject-1.2.3.tar.gz.provenance
      https://example.com/sampleproject-1.2.3.tar.gz   https://other.example.com/sampleproject-1.2.3.tar.gz/provenance   https://other.example.com/sampleproject-1.2.3.tar.gz/provenance
      https://example.com/sampleproject-1.2.3.tar.gz   ../relative                                                       (invalid: not a fully qualified URL)
      https://example.com/sampleproject-1.2.3.tar.gz   http://unencrypted.example.com/provenance                         (invalid: not a secure origin)

-   The index MAY choose to modify the provenance file. For example, the
    index MAY permit adding additional attestations and verification
    materials, such as attestations from third-party auditors or other
    services.

    See changes-to-provenance-objects for an additional discussion of
    reasons why a file's provenance may change.

JSON-based Simple API

The following changes are made to the
JSON simple API <packaging:simple-repository-api-json>:

-   When an uploaded file has one or more attestations, the index MAY
    include a provenance key in the file dictionary for that file.

    The value of the provenance key SHALL be either a JSON string or
    null. If provenance is not null, it SHALL be a URL to the associated
    provenance file.

    See appendix-3 for an explanation of the technical decision to embed
    the SHA-256 digest in the JSON API, rather than the full
    provenance object <provenance-object>.

These changes require a version change to the JSON API:

-   The api-version SHALL specify version 1.3 or later.

Attestation objects

An attestation object is a JSON object with several required keys;
applications or signers may include additional keys so long as all
explicitly listed keys are provided. The required layout of an
attestation object is provided as pseudocode below.

    @dataclass
    class Attestation:
        version: Literal[1]
        """
        The attestation object's version, which is always 1.
        """

        verification_material: VerificationMaterial
        """
        Cryptographic materials used to verify `envelope`.
        """

        envelope: Envelope
        """
        The enveloped attestation statement and signature.
        """


    @dataclass
    class Envelope:
        statement: bytes
        """
        The attestation statement.

        This is represented as opaque bytes on the wire (encoded as base64),
        but it MUST be an JSON in-toto v1 Statement.
        """

        signature: bytes
        """
        A signature for the above statement, encoded as base64.
        """

    @dataclass
    class VerificationMaterial:
        certificate: str
        """
        The signing certificate, as `base64(DER(cert))`.
        """

        transparency_entries: list[object]
        """
        One or more transparency log entries for this attestation's signature
        and certificate.
        """

A full data model for each object in transparency_entries is provided in
appendix-2. Attestation objects SHOULD include one or more transparency
log entries, and MAY include additional keys for other sources of signed
time (such as an 3161 Time Stamping Authority or a Roughtime server).

Attestation objects are versioned; this PEP specifies version 1. Each
version is tied to a single cryptographic suite to minimize unnecessary
cryptographic agility. In version 1, the suite is as follows:

-   Certificates are specified as X.509 certificates, and comply with
    the profile in 5280.
-   The message signature algorithm is ECDSA, with the P-256 curve for
    public keys and SHA-256 as the cryptographic digest function.

Future PEPs may change this suite (and the overall shape of the
attestation object) by selecting a new version number.

Attestation statement and signature generation

The attestation statement is the actual claim that is cryptographically
signed over within the attestation object (i.e., the
envelope.statement).

The attestation statement is encoded as a v1 in-toto Statement object,
in JSON form. When serialized the statement is treated as an opaque
binary blob, avoiding the need for canonicalization. An example
JSON-encoded statement is provided in appendix-4.

In addition to being a v1 in-toto Statement, the attestation statement
is constrained in the following ways:

-   The in-toto subject MUST contain only a single subject.
-   subject[0].name is the distribution's filename, which MUST be a
    valid source distribution <packaging:source-distribution-format> or
    wheel distribution <packaging:binary-distribution-format> filename.
-   subject[0].digest MUST contain a SHA-256 digest. Other digests MAY
    be present. The digests MUST be represented as hexadecimal strings.
-   The following predicateType values are supported:
    -   SLSA Provenance: https://slsa.dev/provenance/v1
    -   PyPI Publish Attestation:
        https://docs.pypi.org/attestations/publish/v1

The signature over this statement is constructed using the v1 DSSE
signature protocol, with a PAYLOAD_TYPE of application/vnd.in-toto+json
and a PAYLOAD_BODY of the JSON-encoded statement above. No other
PAYLOAD_TYPE is permitted.

Provenance objects

The index will serve uploaded attestations along with metadata that can
assist in verifying them in the form of JSON serialized objects.

These provenance objects will be available via both the Simple Index and
JSON-based Simple API as described above, and will have the following
layout:

    {
        "version": 1,
        "attestation_bundles": [
          {
            "publisher": {
              "kind": "important-ci-service",
              "claims": {},
              "vendor-property": "foo",
              "another-property": 123
            },
            "attestations": [
              { /* attestation 1 ... */ },
              { /* attestation 2 ... */ }
            ]
          }
        ]
    }

or, as pseudocode:

    @dataclass
    class Publisher:
        kind: string
        """
        The kind of Trusted Publisher.
        """

        claims: object | None
        """
        Any context-specific claims retained by the index during Trusted Publisher
        authentication.
        """

        _rest: object
        """
        Each publisher object is open-ended, meaning that it MAY contain additional
        fields beyond the ones specified explicitly above. This field signals that,
        but is not itself present.
        """

    @dataclass
    class AttestationBundle:
        publisher: Publisher
        """
        The publisher associated with this set of attestations.
        """

        attestations: list[Attestation]
        """
        The set of attestations included in this bundle.
        """

    @dataclass
    class Provenance:
        version: Literal[1]
        """
        The provenance object's version, which is always 1.
        """

        attestation_bundles: list[AttestationBundle]
        """
        One or more attestation "bundles".
        """

-   version is 1. Like attestation objects, provenance objects are
    versioned, and this PEP only defines version 1.

-   attestation_bundles is a required JSON array, containing one or more
    "bundles" of attestations. Each bundle corresponds to a signing
    identity (such as a Trusted Publishing identity), and contains one
    or more attestation objects.

    As noted in the Publisher model, each AttestationBundle.publisher
    object is specific to its Trusted Publisher but must include at
    minimum:

    -   A kind key, which MUST be a JSON string that uniquely identifies
        the kind of Trusted Publisher.
    -   A claims key, which MUST be a JSON object containing any
        context-specific claims retained by the index during Trusted
        Publisher authentication.

    All other keys in the publisher object are publisher-specific. A
    full illustrative example of a publisher object is provided in
    appendix-1.

    Each array of attestation objects is a superset of the attestations
    array supplied by the uploaded through the attestations field at
    upload time, as described in upload-endpoint and
    changes-to-provenance-objects.

Changes to provenance objects

Provenance objects are not immutable, and may change over time. Reasons
for changes to the provenance object include but are not limited to:

-   Addition of new attestations for a pre-existing signing identity:
    the index MAY choose to allow additional attestations by
    pre-existing signing identities, such as newer attestation versions
    for already uploaded files.
-   Addition of new signing identities and associated attestations: the
    index MAY choose to support attestations from sources other than the
    file's uploader, such as third-party auditors or the index itself.
    These attestations may be performed asynchronously, requiring the
    index to insert them into the provenance object post facto.

Attestation verification

Verifying an attestation object against a distribution file requires
verification of each of the following:

-   version is 1. The verifier MUST reject any other version.
-   verification_material.certificate is a valid signing certificate, as
    issued by an a priori trusted authority (such as a root of trust
    already present within the verifying client).
-   verification_material.certificate identifies an appropriate signing
    subject, such as the machine identity of the Trusted Publisher that
    published the package.
-   envelope.statement is a valid in-toto v1 Statement, with a subject
    and digest that MUST match the distribution's filename and contents.
    For the distribution's filename, matching MUST be performed by
    parsing using the appropriate source distribution or wheel filename
    format, as the statement's subject may be equivalent but normalized.
-   envelope.signature is a valid signature for envelope.statement
    corresponding to verification_material.certificate, as reconstituted
    via the v1 DSSE signature protocol.

In addition to the above required steps, a verifier MAY additionally
verify verification_material.transparency_entries on a policy basis,
e.g. requiring at least one transparency log entry or a threshold of
entries. When verifying transparency entries, the verifier MUST confirm
that the inclusion time for each entry lies within the signing
certificate's validity period.

Security Implications

This PEP is primarily "mechanical" in nature; it provides layouts for
structuring and serving verifiable digital attestations without
specifying higher level security "policies" around attestation validity,
thresholds between attestations, and so forth.

Cryptographic agility in attestations

Algorithmic agility is a common source of exploitable vulnerabilities in
cryptographic schemes. This PEP limits algorithmic agility in two ways:

-   All algorithms are specified in a single suite, rather than a
    geometric collection of parameters. This makes it impossible (for
    example) for an attacker to select a strong signature algorithm with
    a weak hash function, compromising the scheme as a whole.
-   Attestation objects are versioned, and may only contain the
    algorithmic suite specified for their version. If a specific suite
    is considered insecure in the future, clients may choose to blanket
    reject or qualify verifications of attestations that contain that
    suite.

Index trust

This PEP does not increase (or decrease) trust in the index itself: the
index is still effectively trusted to honestly deliver unmodified
package distributions, since a dishonest index capable of modifying
package contents could also dishonestly modify or omit package
attestations. As a result, this PEP's presumption of index trust is
equivalent to the unstated presumption with earlier mechanisms, like PGP
and wheel signatures.

This PEP does not preclude or exclude future index trust mechanisms,
such as PEP 458 and/or PEP 480.

Recommendations

This PEP recommends, but does not mandate, that attestation objects
contain one or more verifiable sources of signed time that corroborate
the signing certificate's claimed validity period. Indices that
implement this PEP may choose to strictly enforce this requirement.

Appendix 1: Example Trusted Publisher Representation

This appendix provides a fictional example of a publisher key within a
simple JSON API project.files[].provenance listing:

    "publisher": {
        "kind": "GitHub",
        "claims": {
            "ref": "refs/tags/v1.0.0",
            "sha": "da39a3ee5e6b4b0d3255bfef95601890afd80709"
        },
        "repository_name": "HolyGrail",
        "repository_owner": "octocat",
        "repository_owner_id": "1",
        "workflow_filename": "publish.yml",
        "environment": null
    }

Appendix 2: Data models for Transparency Log Entries

This appendix contains pseudocoded data models for transparency log
entries in attestation objects. Each transparency log entry serves as a
source of signed inclusion time, and can be verified either online or
offline.

    @dataclass
    class TransparencyLogEntry:
        log_index: int
        """
        The global index of the log entry, used when querying the log.
        """

        log_id: str
        """
        An opaque, unique identifier for the log.
        """

        entry_kind: str
        """
        The kind (type) of log entry.
        """

        entry_version: str
        """
        The version of the log entry's submitted format.
        """

        integrated_time: int
        """
        The UNIX timestamp from the log from when the entry was persisted.
        """

        inclusion_proof: InclusionProof
        """
        The actual inclusion proof of the log entry.
        """


    @dataclass
    class InclusionProof:
        log_index: int
        """
        The index of the entry in the tree it was written to.
        """

        root_hash: str
        """
        The digest stored at the root of the Merkle tree at the time of proof
        generation.
        """

        tree_size: int
        """
        The size of the Merkle tree at the time of proof generation.
        """

        hashes: list[str]
        """
        A list of hashes required to complete the inclusion proof, sorted
        in order from leaf to root. The leaf and root hashes are not themselves
        included in this list; the root is supplied via `root_hash` and the client
        must calculate the leaf hash.
        """

        checkpoint: str
        """
        The signed tree head's signature, at the time of proof generation.
        """

        cosigned_checkpoints: list[str]
        """
        Cosigned checkpoints from zero or more log witnesses.
        """

Appendix 3: Simple JSON API size considerations

A previous draft of this PEP required embedding each
provenance object <provenance-object> directly into its appropriate part
of the JSON Simple API.

The current version of this PEP embeds the SHA-256 digest of the
provenance object instead. This is done for size and network bandwidth
consideration reasons:

1.  We estimate the typical size of an attestation object to be
    approximately 5.3 KB of JSON.
2.  We conservatively estimate that indices eventually host around 3
    attestations per release file, or approximately 15.9 KB of JSON per
    combined provenance object.
3.  As of May 2024, the average project on PyPI has approximately 21
    release files. We conservatively expect this average to increase
    over time.
4.  Combined, these numbers imply that a typical project might expect to
    host between 60 and 70 attestations, or approximately 339 KB of
    additional JSON in its "project detail" endpoint.

These numbers are significantly worse in "pathological" cases, where
projects have hundreds or thousands of releases and/or dozens of files
per release.

Appendix 4: Example attestation statement

Given a source distribution sampleproject-1.2.3.tar.gz with a SHA-256
digest of
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, the
following is an appropriate in-toto Statement, as a JSON object:

    {
      "_type": "https://in-toto.io/Statement/v1",
      "subject": [
        {
          "name": "sampleproject-1.2.3.tar.gz",
          "digest": {"sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"}
        }
      ],
      "predicateType": "https://some-arbitrary-predicate.example.com/v1",
      "predicate": {
        "something-else": "foo"
      }
    }

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.