Following system colour scheme Selected dark colour scheme Selected light colour scheme

Python Enhancement Proposals

PEP 752 – Implicit namespaces for package repositories

PEP 752 – Implicit namespaces for package repositories

Author:
Ofek Lev <ofekmeister at gmail.com>, Jarek Potiuk <potiuk at apache.org>
Sponsor:
Barry Warsaw <barry at python.org>
PEP-Delegate:
Dustin Ingram <di at python.org>
Discussions-To:
Discourse thread
Status:
Accepted
Type:
Standards Track
Topic:
Packaging
Created:
13-Aug-2024
Post-History:
18-Aug-2024, 07-Sep-2024
Resolution:
29-Jun-2026

Table of Contents

Abstract

This PEP specifies a way for organizations to reserve package name prefixes for future uploads.

“Namespaces are one honking great idea – let’s do more of those!” - PEP 20

Motivation

The current ecosystem lacks a way for projects with many packages to signal a verified pattern of ownership, who desire complete control over their namespace for safety and branding reasons. A few examples:

  • Major cloud providers like Amazon, Google and Microsoft have a common prefix for each feature’s corresponding package [1]. For example, most of Google’s packages are prefixed by google-cloud- e.g. google-cloud-compute for using virtual machines.
  • OpenTelemetry is an open standard for observability with official packages for the core APIs and SDK with contrib packages to collect data from various sources. All packages are prefixed by opentelemetry- with child prefixes in the form opentelemetry-<component>-<name>-. The contrib packages live in a central repository and they are the only ones with the ability to publish.
  • Apache Airflow is a platform to programmatically author, schedule and monitor workflows. It has providers, where each provider package is prefixed by apache-airflow-providers-.
  • Typeshed is a community effort to maintain type stubs for various packages. The stub packages they maintain mirror the package name they target and are prefixed by types-. For example, the package requests has a stub that users would depend on called types-requests. Unofficial stubs are not supposed to use the types- prefix and are expected to use a -stubs suffix instead.

Such projects are uniquely vulnerable to name-squatting attacks which can ultimately result in dependency confusion.

For example, say a new product is released for which monitoring would be valuable. It would be reasonable to assume that Datadog would eventually support it as an official integration. It takes a nontrivial amount of time to deliver such an integration due to roadmap prioritization and the time required for implementation. It would be impossible to reserve the name of every potential package so in the interim an attacker may create a package that appears legitimate which would execute malicious code (like secret exfiltration) at runtime. Not only are users more likely to install such packages but doing so taints the perception of the entire project. Community projects like Apache Airflow have also experienced this.

Although PEP 708 attempts to address this attack vector, it is specifically about the case of multiple repositories being considered during dependency resolution and does not offer any protection to the aforementioned use cases.

In recent years, typosquatting has become a popular attack vector [2]. The current protection against this used by PyPI is to normalize similar characters but that is insufficient for these use cases. Namespacing would drastically reduce the incidence of typosquatting:

  • Typos would have to be in the prefix itself which is normalized and likely to be a short, well-known identifier like aws-.
  • An index may require namespaces to be applied for and approved, reducing the likelihood of typosquatting of such events.
  • An attacker would be unable to squat a name that includes a namespace.

Rationale

Other package ecosystems have generally solved this problem by taking one of two approaches: either minimizing or maximizing backwards compatibility.

  • NPM has the concept of scoped packages which were introduced primarily to combat there being a dearth of available good package names (whether a real or perceived phenomenon). When a user or organization signs up they are given a scope that matches their name. For example, the package for using Google Cloud Storage is @google-cloud/storage where @google-cloud/ is the scope. Regular user accounts (non-organization) may publish unscoped packages for public use. This approach has the lowest amount of backwards compatibility because every installer and tool has to be modified to account for scopes.
  • NuGet has the concept of package ID prefix reservation which was introduced primarily to satisfy users wishing to know where a package came from. A package name prefix may be reserved for use by one or more owners. Every reserved package has a special indication on its page to communicate this. After reservation, any upload with a reserved prefix will fail if the user is not an owner of the prefix. Existing packages that have a prefix that is owned may continue to release as usual. This approach has the highest amount of backwards compatibility because only modifications to indices like PyPI are required and installers do not need to change.

This PEP specifies the NuGet approach of authorized reservation across a flat namespace. Any solution that requires new package syntax must be built atop the existing flat namespace and therefore implicit namespaces acquired via a reservation mechanism would be a prerequisite to such explicit namespaces.

Although existing packages matching a reserved namespace would be untouched, preventing future unauthorized uploads and strategically applying PEP 541 takedown requests for malicious cases would reduce risks to users to a negligible level.

Terminology

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

Owner
Owners are entities that are allowed to upload certain package names.
Grant
A grant is a reservation of a namespace for a package repository.
Parent Namespace
A namespace’s parent refers to the namespace without the trailing hyphenated component e.g. the parent of foo-bar is foo.
Child Namespace
A namespace’s child refers to the namespace with a single trailing hyphenated component e.g. foo-bar is a valid child of foo.

Specification

Naming

A namespace MUST be a valid project name and normalized internally e.g. foo.bar would become foo-bar.

Semantics

A namespace grant bestows ownership over the following:

  1. A project that exactly matches the namespace itself.
  2. Projects that start with the namespace followed by a hyphen. For example, the namespace foo would match the normalized project name foo-bar but not the project name foobar.

Package name matching acts upon the normalized namespace.

Namespaces are per-package repository and SHALL NOT be shared between repositories. For example, if PyPI has a namespace acme that is owned by the company Acme, packages starting with acme- that come from other non-PyPI mirror repositories do not confer the same level of trust.

Grants MUST NOT overlap ownership. For example, if there is an existing grant for foo-bar then a new grant for foo would only be possible for the owner of the former. An overlap is determined by comparing the normalized proposed namespace with the normalized namespace of every existing root grant. Every comparison must append a hyphen to the end of the proposed and existing namespace. An overlap is detected when any existing namespace starts with the proposed namespace.

Repositories SHOULD impose a depth limit on the number of hyphens in a namespace. For example, if the depth limit is 1 then the namespace foo-bar would be allowed but foo-bar-baz could not be granted.

Policies for granting and managing namespaces are not discussed here as they are specific to each index. The proposed namespace policy for PyPI is described in PEP 755.

Uploads

Uploads MUST fail with a 409 Conflict HTTP status code if the name of a package being uploaded matches a reserved namespace and the project owner does not have an active grant for the namespace.

Repositories SHOULD have an exception to this rule for projects that existed before the namespace was reserved.

Repository Metadata

The JSON API version will be incremented from 1.4 to 1.5. The following API changes MUST be implemented by repositories that support this PEP. Repositories that do not support this PEP MUST NOT implement these changes so that consumers of the API are able to determine whether the repository supports this PEP.

The following API changes would allow installers to offer users extra security policies.

Project Detail

The project detail response will be modified as follows.

The namespaces key MUST be null if the project does not match an active namespace grant. If the project does match a namespace grant, the value MUST be an array of mappings representing each matching namespace. Every mapping MUST have the following keys:

  • name: This is the associated normalized namespace e.g. foo-bar.
  • owned: This is a boolean and will be true if the project owner is one of the current owners of the grant. This will only be false if the project existed before the namespace was reserved and the repository allows continued uploads.

Namespace List

The format of this URL is /namespaces.

The response MUST be an array of mappings representing each reserved namespace. Every mapping MUST have a name key that is the normalized namespace e.g. foo-bar.

Namespace Detail

The format of this URL is /namespace/<namespace> where <namespace> is the normalized namespace. For example, the URL for the namespace foo.bar would be /namespace/foo-bar.

The response MUST be a mapping with the following keys:

  • name: This is the normalized version of the namespace e.g. foo-bar.
  • parent: This is the parent namespace if it exists. For example, if the namespace is foo-bar and there is an active grant for foo, then this would be "foo". If there is no parent then this key will be null.
  • children: This is an array of direct child namespaces. For example, if the namespace is foo and there are active grants for foo-bar and foo-bar-baz then this would be ["foo-bar"].

The mapping MAY have an owner key that refers to the current owner of the namespace.

Grant Removal

When a reserved namespace becomes unclaimed, repositories MUST set the namespaces key to null in the API.

Community Buy-in

Representatives from the following organizations have expressed support for this PEP (with a link to the discussion):

Backwards Compatibility

There are no intrinsic concerns because projects continue to use existing naming semantics. Projects with or without a namespace are indistinguishable from the perspective of the user. Installers need no modification.

Additionally, many projects have already chosen to signal a shared purpose with a prefix like typeshed has done.

Security Implications

Installers could support enabling a security policy that would only allow packages that match a specific set of namespaces and whose owner has an active grant for the namespace.

How to Teach This

We will update the PyPUG documentation to describe the new metadata that is returned by the API.

In future we could also note tooling that supports utilizing namespaces to provide extra security guarantees during installation.

Reference Implementation

A complete reference implementation of this PEP is available in PR #17691.

Rejected Ideas

Explicit Non-User Ownership

As package repositories have a flat namespace, allowing any user to reserve a namespace would be untenable not just because there would be contention for a finite resource, but also because no repository has enough human operators to manage the vetting of an arbitrary number of users.

An earlier version of this PEP proposed that only organizations could reserve namespaces because of these practical considerations. However, this was rejected as the organization concept has not been specified and imposing such restrictions based on the anticipated PyPI implementation is unnecessary.

Organization Scoping

The primary motivation for this PEP is to reduce dependency confusion attacks and NPM-style scoping with an allowance of the legacy flat namespace would increase the risk. If documentation instructed a user to install bar in the namespace foo then the user must be careful to install @foo/bar and not foo-bar, or vice versa. The Python packaging ecosystem has normalization rules for names in order to maximize the ease of communication and this would be a regression.

The runtime environment of Python is also not conducive to scoping. Whereas multiple versions of the same JavaScript package may coexist, Python only allows a single global namespace. Barring major changes to the language itself, this is nearly impossible to change.

Scoping would be particularly affected by organization changes which are bound to happen over time. An organization may change their name due to internal shuffling, an acquisition, or any other reason. Whenever this happens every project they own would in effect be renamed which would cause unnecessary confusion for users, frequently.

Finally, the disruption to the community would be massive because it would require an update from every package manager, security scanner, IDE, etc. New packages released with the scoping would be incompatible with older tools and would cause confusion for users along with frustration from maintainers having to triage such complaints.

Artifact-level Namespace Association

An earlier version of this PEP proposed that metadata be associated with individual artifacts at the point of release. This was rejected because it had the potential to cause confusion for users who would expect the namespace authorization guarantee to be at the project level based on current grants rather than the time at which a given release occurred.

Support HTML Simple API

Exposing project-level metadata in the HTML version of the Simple API could happen in one of two ways.

The first is exposing a data- attribute on the /simple/ page that enumerates every project. There is no precedent for this, and installers generally do not use this page. Additionally, this page is often cached for long periods of time (24 hours in the case of PyPI).

The other is to add a data- attribute on every artifact. This is suboptimal because it may introduce confusion similar to the rejected artifact-level association idea. Another consideration is that in practice many private indices are implemented as static pages served by cloud storage backed by a CDN. In this scenario, every namespace change would require a mass update of all artifacts of matching projects.

Encourage Dedicated Package Repositories

Critically, this imposes a burden on projects to maintain their own infra. This is an unrealistic expectation for the vast majority of companies and a complete non-starter for community projects.

This does not help in most cases because the default behavior of most package managers is to use PyPI so users attempting to perform a simple pip install would already be vulnerable to malicious packages.

In this theoretical future every project must document how to add their repository to dependency resolution, which would be different for each package manager. Few package managers are able to download specific dependencies from specific repositories and would require users to use verbose configuration in the common case.

The ones that do not support this would instead find a given package using an ordered enumeration of repositories, leading to dependency confusion. For example, say a user wants two packages from two custom repositories X and Y. If each repository has both packages but one is malicious on X and the other is malicious on Y then the user would be unable to satisfy their requirements without encountering a malicious package.

Open Namespaces

An earlier version of this PEP proposed that the owner of a grant may choose to allow others the ability to release new projects with the associated namespace. This was removed due to insufficient motivation and the fact that repositories could technically satisfy such use cases with standard grant semantics.

Hidden Grants

An earlier version of this PEP proposed that repositories could create hidden grants that are not visible to the public which prevent their namespaces from being claimed by others. This was removed due to insufficient motivation.

Exclusive Reliance on Provenance Assertions

The idea here [3] would be to design a general purpose way for clients to make provenance assertions to verify certain properties of dependencies, each with custom syntax. Some examples:

  • The package was uploaded by a specific organization or user name e.g. pip install "azure-loganalytics from microsoft"
  • The package was uploaded by an owner of a specific domain name e.g. pip install "google-cloud-compute from cloud.google.com"
  • The package was uploaded by a user with a specific email address e.g. pip install "aws-cdk-lib from contact@amazon.com"
  • The package matching a namespace was uploaded by an authorized party (this PEP)

A fundamental downside is that it doesn’t play well with multiple repositories. For example, say a user wants the azure-loganalytics package and wants to ensure it comes from the organization named microsoft. If Microsoft’s organization name on PyPI is microsoft then a package manager that defaults to PyPI could accept azure-loganalytics from microsoft. However, if multiple repositories are used for dependency resolution then the user would have to specify the repository as part of the definition which is unrealistic for reasons outlined in the dedicated section on asserting package owner names.

Another general weakness with this approach is that a user attempting to perform a simple pip install without special syntax, which is the most common scenario, would already be vulnerable to malicious packages. In order to overcome this there would have to be some default trust mechanism, which in all cases would impose certain UX or resolver logic upon every tool.

For example, package managers could be changed such that the first time a package is installed the user would receive a confirmation prompt displaying the provenance details. This would be very confusing and noisy, especially for new users, and would be a breaking UX change for existing users. Many methods of installation wouldn’t work for this scenario such as running in CI or installing from a requirements file where the user would potentially be getting hundreds of prompts.

One solution to make this less disruptive for users would be to manually maintain a list of trustworthy details (organization/user names, domain names, email addresses, etc.). This could be discoverable by packages providing entry points which package managers could learn to detect and which corporate environments could install by default. This has the major downside of not providing automatic guarantees which would limit the usefulness for the average user who is more likely to be affected.

There are two ideas that could be used to provide automatic protection, which could be based on PEP 740 attestations or a new mechanism for utilizing third-party APIs that host the metadata.

First, each repository could offer a service that verifies the owner of a package using whatever criteria they deem appropriate. After verification, the repository would add the details to a dedicated package that would be installed by default.

This would require dedicated maintenance which is unrealistic for most repositories, even PyPI currently. It’s unclear how community projects without the resources for something like a domain name would be supported. Critically, this solution would cause extra confusion for users in the case of multiple repositories as each might have their own verification processes, attestation criteria and default package containing the verified details. It would be challenging to get community buy-in of every package manager to be aware of each repositories’ chosen verification package and install that by default before dependency resolution.

Should digital attestations become the chosen mechanism, a downside is that implementing this in custom package repositories would require a significant amount of work. In the case of PyPI, the prerequisite work on Trusted Publishing and then the PEP 740 implementation itself took the equivalent of a full-time engineer one year whose time was paid for by a corporate sponsor. Other organizations are unlikely to implement similar work because simpler mechanisms make it possible to implement reproducible builds. When everything is internally managed, attestations are also not very useful. Community projects are unlikely to undertake this effort because they would likely lack the resources to maintain the necessary infrastructure themselves and moreover there are significant downsides to encouraging dedicated package repositories.

The other idea would be to host provenance assertions externally and push more logic client-side. A possible implementation might be to specify a provenance API that could be hosted at a designated relative path like /provenance. Projects on each repository could then be configured to point to a particular domain and this information would be passed on to clients during installation.

While this distributed approach does impose less of an infrastructure burden on repositories, it has the potential to be a security risk. If an external provenance API is compromised, it could lead to malicious packages being installed. If an external API is down, it could lead to package installation failing or package managers might only emit warnings in which case there is no security benefit.

Additionally, this disadvantages community projects that do not have the resources to maintain such an API. They could use free hosting solutions such as what many do for documentation but they do not technically own the infrastructure and they would be compromised should the generous offerings be restricted.

Finally, while both of these theoretical approaches are not yet prescriptive, they imply assertions at the artifact level which was already a rejected idea.

Asserting Package Owner Names

This is about asserting that the package came from a specific organization or user name. It’s quite similar to the organization scoping idea except that a flat namespace is the base assumption.

This would require modifications to the JSON API of each supported repository and could be implemented by exposing extra metadata or as proper provenance assertions.

As with the organization scoping idea, a new syntax would be required like microsoft::azure-loganalytics where microsoft is the organization and azure-loganalytics is the package. Although this plays well with the existing flat namespace in comparison, it retains the critical downside of being a disruption for the community with the number of changes required.

A unique downside is that names are an implementation detail of repositories. On PyPI, the names of organizations are separate from user names so there is potential for conflicts. In the case of multiple repositories, users might run into cases of dependency confusion similar to the one at the end of the Encourage Dedicated Package Repositories rejected idea.

To ameliorate this, it was suggested that the syntax be expanded to also include the expected repository URL like microsoft@pypi.org::azure-loganalytics. This syntax or something like it is so verbose that it could lead to user confusion, and even worse, frustration should it gain increased adoption among those able to maintain dedicated infrastructure (community projects would not benefit).

The expanded syntax is an attempt to standardize resolver behavior and configuration within dependency specifiers. Not only would this be mandating the UX of tools, it lacks precedent in package managers for language ecosystems with or without the concept of package repositories. In such cases, the resolver configuration is separate from the dependency definition.

Language Tool Resolution behavior
Rust Cargo Dependency resolution can be modified within Cargo.toml using the the [patch] table.
JS Yarn Although they have the concept of protocols (which are similar to the URL schemes of our direct references), users configure the resolutions field in the package.json file.
JS npm Users can configure the overrides field in the package.json file.
Ruby Bundler The Gemfile allows for specifying an explicit source for a gem.
C# NuGet It’s possible to override package versions by configuring the Directory.Packages.props file.
PHP Composer The composer.json file allows for specifying repository sources for specific packages.
Go go The go.mod file allows for specifying a replace directive. Note that this is used for direct dependencies as well as transitive dependencies.

Use Fixed Prefixes

The idea here would be to have one or more top-level fixed prefixes that are used for namespace reservations:

  • com-: Reserved for corporate organizations.
  • org-: Reserved for community organizations.

Organizations would then apply for a namespace prefixed by the type of their organization.

This would cause perpetual disruption because when projects begin it is unknown whether a user base will be large enough to warrant a namespace reservation. Whenever that happens the project would have to be renamed which would put a high maintenance burden on the project maintainers and would cause confusion for users who have to learn a new way to reference the project’s packages. The potential for this deterring projects from reserving namespaces at all is high.

Another issue with this approach is that projects often have branding in mind (example) and would be reluctant to change their package names.

Use DNS

The idea here is to add a new metadata field to projects in the API called domain-authority. Repositories would support a new endpoint for verifying the domain via HTTPS. Clients would then support options to allow certain domains.

This does not solve the problem for the target audience who do not check where their packages are coming from and is more about checking for the integrity of uploads which is already supported in a more secure way by PEP 740.

Most projects do not have a domain and could not benefit from this, unfairly favoring organizations that have the financial means to acquire one.

Open Issues

None at this time.

Footnotes