PEP: 752 Title: Implicit namespaces for package repositories Author:
Ofek Lev <ofekmeister@gmail.com> Sponsor: Barry Warsaw
<barry@python.org> PEP-Delegate: Dustin Ingram <di@python.org>
Discussions-To: https://discuss.python.org/t/63192 Status: Draft Type:
Standards Track Topic: Packaging Created: 13-Aug-2024 Post-History:
18-Aug-2024, 07-Sep-2024,

Abstract

This PEP specifies a way for organizations to reserve package name
prefixes for future uploads.

  "Namespaces are one honking great idea -- let's do more of those!" -
  PEP 20

Motivation

The current ecosystem lacks a way for projects with many packages to
signal a verified pattern of ownership. Such projects fall into two
categories.

The first category is projects[1] that want complete control over their
namespace. A few examples:

-   Major cloud providers like Amazon, Google and Microsoft have a
    common prefix for each feature's corresponding package[2]. For
    example, most of Google's packages are prefixed by google-cloud-
    e.g. google-cloud-compute for using virtual machines.
-   OpenTelemetry is an open standard for observability with official
    packages for the core APIs and SDK with contrib packages to collect
    data from various sources. All packages are prefixed by
    opentelemetry- with child prefixes in the form
    opentelemetry-<component>-<name>-. The contrib packages live in a
    central repository and they are the only ones with the ability to
    publish.

The second category is projects[3] that want to share their namespace
such that some packages are officially maintained and third-party
developers are encouraged to participate by publishing their own. Some
examples:

-   Project Jupyter is devoted to the development of tooling for sharing
    interactive documents. They support extensions which in most cases
    (and in all cases for officially maintained extensions) are prefixed
    by jupyter-.
-   Django is one of the most widely used web frameworks in existence.
    They have the concept of reusable apps, which are commonly installed
    via third-party packages that implement a subset of functionality to
    extend Django-based websites. These packages are by convention
    prefixed by django- or dj-.

Such projects are uniquely vulnerable to name-squatting attacks which
can ultimately result in dependency confusion.

For example, say a new product is released for which monitoring would be
valuable. It would be reasonable to assume that Datadog would eventually
support it as an official integration. It takes a nontrivial amount of
time to deliver such an integration due to roadmap prioritization and
the time required for implementation. It would be impossible to reserve
the name of every potential package so in the interim an attacker may
create a package that appears legitimate which would execute malicious
code at runtime. Not only are users more likely to install such packages
but doing so taints the perception of the entire project.

Although PEP 708 attempts to address this attack vector, it is
specifically about the case of multiple repositories being considered
during dependency resolution and does not offer any protection to the
aforementioned use cases.

Namespacing also would drastically reduce the incidence of typosquatting
because typos would have to be in the prefix itself which is normalized
and likely to be a short, well-known identifier like aws-. In recent
years, typosquatting has become a popular attack vector [4].

The current protection against typosquatting used by PyPI is to
normalize similar characters but that is insufficient for these use
cases.

Rationale

Other package ecosystems have generally solved this problem by taking
one of two approaches: either minimizing or maximizing backwards
compatibility.

-   NPM has the concept of scoped packages which were introduced
    primarily to combat there being a dearth of available good package
    names (whether a real or perceived phenomenon). When a user or
    organization signs up they are given a scope that matches their
    name. For example, the package for using Google Cloud Storage is
    @google-cloud/storage where @google-cloud/ is the scope. Regular
    user accounts (non-organization) may publish unscoped packages for
    public use. This approach has the lowest amount of backwards
    compatibility because every installer and tool has to be modified to
    account for scopes.
-   NuGet has the concept of package ID prefix reservation which was
    introduced primarily to satisfy users wishing to know where a
    package came from. A package name prefix may be reserved for use by
    one or more owners. Every reserved package has a special indication
    on its page to communicate this. After reservation, any upload with
    a reserved prefix will fail if the user is not an owner of the
    prefix. Existing packages that have a prefix that is owned may
    continue to release as usual. This approach has the highest amount
    of backwards compatibility because only modifications to indices
    like PyPI are required and installers do not need to change.

This PEP specifies the NuGet approach of authorized reservation across a
flat namespace. Any solution that requires new package syntax must be
built atop the existing flat namespace and therefore implicit namespaces
acquired via a reservation mechanism would be a prerequisite to such
explicit namespaces.

Although existing packages matching a reserved namespace would be
untouched, preventing future unauthorized uploads and strategically
applying PEP 541 takedown requests for malicious cases would reduce
risks to users to a negligible level.

Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in 2119.

Organization

    Organizations are entities that own projects and have various users
    associated with them.

Grant

    A grant is a reservation of a namespace for a package repository.

Open Namespace

    An open namespace allows for uploads from any project owner.

Restricted Namespace

    A restricted namespace only allows uploads from an owner of the
    namespace.

Parent Namespace

    A namespace's parent refers to the namespace without the trailing
    hyphenated component e.g. the parent of foo-bar is foo.

Child Namespace

    A namespace's child refers to the namespace with additional trailing
    hyphenated components e.g. foo-bar is a valid child of foo as is
    foo-bar-baz.

Specification

Organizations

Any package repository that allows for the creation of projects (e.g.
non-mirrors) MAY offer the concept of organizations. Organizations are
entities that own projects and have various users associated with them.

Organizations MAY reserve one or more namespaces. Such reservations
neither confer ownership nor grant special privileges to existing
projects.

Naming

A namespace MUST be a valid project name and normalized internally e.g.
foo.bar would become foo-bar.

Semantics

A namespace grant bestows ownership over the following:

1.  A project matching the namespace itself such as the placeholder
    package microsoft.
2.  Projects that start with the namespace followed by a hyphen. For
    example, the namespace foo would match the normalized project name
    foo-bar but not the project name foobar.

Package name matching acts upon the normalized namespace.

Namespaces are per-package repository and SHALL NOT be shared between
repositories. For example, if PyPI has a namespace microsoft that is
owned by the company Microsoft, packages starting with microsoft- that
come from other non-PyPI mirror repositories do not confer the same
level of trust.

Grants MUST NOT overlap. For example, if there is an existing grant for
foo-bar then a new grant for foo would be forbidden. An overlap is
determined by comparing the normalized proposed namespace with the
normalized namespace of every existing root grant. Every comparison must
append a hyphen to the end of the proposed and existing namespace. An
overlap is detected when any existing namespace starts with the proposed
namespace.

Uploads

If the following criteria are all true for a given upload:

1.  The project does not yet exist.
2.  The name matches a reserved namespace.
3.  The project is not owned by an organization with an active grant for
    the namespace.

Then the upload MUST fail with a 403 HTTP status code.

Open Namespaces

The owner of a grant may choose to allow others the ability to release
new projects with the associated namespace. Doing so MUST allow uploads
for new projects matching the namespace from any user.

It is possible for the owner of a namespace to both make it open and
allow other organizations to use the grant. In this case, the authorized
organizations have no special permissions and are equivalent to an open
grant without ownership.

Hidden Grants

Repositories MAY create hidden grants that are not visible to the public
which prevent their namespaces from being claimed by others. Such grants
MUST NOT be open and SHOULD NOT be exposed in the API.

Hidden grants are useful for repositories that wish to enforce upload
restrictions without the need to expose the namespace to the public.

Repository Metadata

The JSON API <691> version will be incremented from 1.2 to 1.3. The
following API changes MUST be implemented by repositories that support
this PEP. Repositories that do not support this PEP MUST NOT implement
these changes so that consumers of the API are able to determine whether
the repository supports this PEP.

Project Detail

The project detail <691#project-detail> response will be modified as
follows.

The namespace key MUST be null if the project does not match an active
namespace grant. If the project does match a namespace grant, the value
MUST be a mapping with the following keys:

-   prefix: This is the associated normalized namespace e.g. foo-bar. If
    the owner of the project owns multiple matching grants then this
    MUST be the namespace with the most number of characters. For
    example, if the project name matched both foo-bar and foo-bar-baz
    then this key would be the latter.
-   authorized: This is a boolean and will be true if the project owner
    is an organization and is one of the current owners of the grant.
    This is useful for tools that wish to make a distinction between
    official and community packages.
-   open: This is a boolean indicating whether the namespace is open.

Namespace Detail

The format of this URL is /namespace/<namespace> where <namespace> is
the normalized namespace. For example, the URL for the namespace foo.bar
would be /namespace/foo-bar.

The response will be a mapping with the following keys:

-   prefix: This is the normalized version of the namespace e.g.
    foo-bar.
-   owner: This is the organization that is responsible for the
    namespace.
-   open: This is a boolean indicating whether the namespace is open.
-   parent: This is the parent namespace if it exists. For example, if
    the namespace is foo-bar and there is an active grant for foo, then
    this would be "foo". If there is no parent then this key will be
    null.
-   children: This is an array of any child namespaces. For example, if
    the namespace is foo and there are active grants for foo-bar and
    foo-bar-baz then this would be ["foo-bar", "foo-bar-baz"].

Grant Removal

When a reserved namespace becomes unclaimed, repositories MUST set the
namespace key to null in the API.

Namespaces that were previously claimed but are now not SHOULD be
eligible for claiming again by any organization.

Community Buy-in

Representatives from the following organizations have expressed support
for this PEP (with a link to the discussion):

-   Apache Airflow
-   Typeshed
-   Project Jupyter (expanded)
-   Microsoft
-   DataDog

Backwards Compatibility

There are no intrinsic concerns because there is still a flat namespace
and installers need no modification. Additionally, many projects have
already chosen to signal a shared purpose with a prefix like typeshed
has done.

Security Implications

-   There is an opportunity to build on top of PEP 740 and PEP 480 so
    that one could prove cryptographically that a specific release came
    from an owner of the associated namespace. This PEP makes no effort
    to describe how this will happen other than that work is planned for
    the future.

How to Teach This

For consumers of packages we will document how metadata is exposed in
the API and potentially in future note tooling that supports utilizing
namespaces to provide extra security guarantees during installation.

Reference Implementation

None at this time.

Rejected Ideas

Organization Scoping

The primary motivation for this PEP is to reduce dependency confusion
attacks and NPM-style scoping with an allowance of the legacy flat
namespace would increase the risk. If documentation instructed a user to
install bar in the namespace foo then the user must be careful to
install @foo/bar and not foo-bar, or vice versa. The Python packaging
ecosystem has normalization rules for names in order to maximize the
ease of communication and this would be a regression.

The runtime environment of Python is also not conducive to scoping.
Whereas multiple versions of the same JavaScript package may coexist,
Python only allows a single global namespace. Barring major changes to
the language itself, this is nearly impossible to change. Additionally,
users have come to expect that the package name is usually the same as
what they would import and eliminating the flat namespace would do away
with that convention.

Scoping would be particularly affected by organization changes which are
bound to happen over time. An organization may change their name due to
internal shuffling, an acquisition, or any other reason. Whenever this
happens every project they own would in effect be renamed which would
cause unnecessary confusion for users, frequently.

Finally, the disruption to the community would be massive because it
would require an update from every package manager, security scanner,
IDE, etc. New packages released with the scoping would be incompatible
with older tools and would cause confusion for users along with
frustration from maintainers having to triage such complaints.

Encourage Dedicated Package Repositories

Critically, this imposes a burden on projects to maintain their own
infra. This is an unrealistic expectation for the vast majority of
companies and a complete non-starter for community projects.

This does not help in most cases because the default behavior of most
package managers is to use PyPI so users attempting to perform a simple
pip install would already be vulnerable to malicious packages.

In this theoretical future every project must document how to add their
repository to dependency resolution, which would be different for each
package manager. Few package managers are able to download specific
dependencies from specific repositories and would require users to use
verbose configuration in the common case.

The ones that do not support this would instead find a given package
using an ordered enumeration of repositories, leading to dependency
confusion. For example, say a user wants two packages from two custom
repositories X and Y. If each repository has both packages but one is
malicious on X and the other is malicious on Y then the user would be
unable to satisfy their requirements without encountering a malicious
package.

Use Fixed Prefixes

The idea here would be to have one or more top-level fixed prefixes that
are used for namespace reservations:

-   com-: Reserved for corporate organizations.
-   org-: Reserved for community organizations.

Organizations would then apply for a namespace prefixed by the type of
their organization.

This would cause perpetual disruption because when projects begin it is
unknown whether a user base will be large enough to warrant a namespace
reservation. Whenever that happens the project would have to be renamed
which would put a high maintenance burden on the project maintainers and
would cause confusion for users who have to learn a new way to reference
the project's packages. The potential for this deterring projects from
reserving namespaces at all is high.

Another issue with this approach is that projects often have branding in
mind (example) and would be reluctant to change their package names.

It's unrealistic to expect every company and project to voluntarily
change their existing and future package names.

Use DNS

The idea here is to add a new metadata field to projects in the API
called domain-authority. Repositories would support a new endpoint for
verifying the domain via HTTPS. Clients would then support options to
allow certain domains.

This does not solve the problem for the target audience who do not check
where their packages are coming from and is more about checking for the
integrity of uploads which is already supported in a more secure way by
PEP 740.

Most projects do not have a domain and could not benefit from this,
unfairly favoring organizations that have the financial means to acquire
one.

Open Issues

None at this time.

Footnotes

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

[1] Additional examples of projects with restricted namespaces:

-   Typeshed is a community effort to maintain type stubs for various
    packages. The stub packages they maintain mirror the package name
    they target and are prefixed by types-. For example, the package
    requests has a stub that users would depend on called
    types-requests. Unofficial stubs are not supposed to use the types-
    prefix and are expected to use a -stubs suffix instead.
-   Sphinx is a documentation framework popular for large technical
    projects such as Swift and Python itself. They have the concept of
    extensions which are prefixed by sphinxcontrib-, many of which are
    maintained within a dedicated organization.
-   Apache Airflow is a platform to programmatically orchestrate tasks
    as directed acyclic graphs (DAGs). They have the concept of plugins,
    and also providers which are prefixed by apache-airflow-providers-.

[2] The following shows the package prefixes for the major cloud
providers:

-   Amazon: aws-cdk-
-   Google: google-cloud- and others based on google-
-   Microsoft: azure-

[3] Additional examples of projects with open namespaces:

-   pytest is Python's most popular testing framework. They have the
    concept of plugins which may be developed by anyone and by
    convention are prefixed by pytest-.
-   MkDocs is a documentation framework based on Markdown files. They
    also have the concept of plugins which may be developed by anyone
    and are usually prefixed by mkdocs-.
-   Datadog offers observability as a service for organizations at any
    scale. The Datadog Agent ships out-of-the-box with official
    integrations for many products, like various databases and web
    servers, which are distributed as Python packages that are prefixed
    by datadog-. There is support for creating third-party integrations
    which customers may run.

[4] Examples of typosquatting attacks targeting Python users:

-   django- namespace was squatted, among other packages, leading to a
    postmortem by PyPI.
-   cupy- namespace was squatted by a malicious actor thousands of
    times.
-   scikit- namespace was squatted, among other packages. Notice how
    packages with a known prefix are much more prone to successful
    attacks.
-   typing- namespace was squatted and this would be useful to prevent
    as a hidden grant.