PEP 752 – Implicit namespaces for package repositories
- Author:
- Ofek Lev <ofekmeister at gmail.com>
- Sponsor:
- Barry Warsaw <barry at python.org>
- PEP-Delegate:
- Dustin Ingram <di at python.org>
- Discussions-To:
- Discourse thread
- Status:
- Draft
- Type:
- Standards Track
- Topic:
- Packaging
- Created:
- 13-Aug-2024
- Post-History:
- 18-Aug-2024, 07-Sep-2024
Abstract
This PEP specifies a way for organizations to reserve package name prefixes for future uploads.
“Namespaces are one honking great idea – let’s do more of those!” - PEP 20
Motivation
The current ecosystem lacks a way for projects with many packages to signal a verified pattern of ownership. Such projects fall into two categories.
The first category is projects [1] that want complete control over their namespace. A few examples:
- Major cloud providers like Amazon, Google and Microsoft have a common prefix
for each feature’s corresponding package [3]. For example, most of Google’s
packages are prefixed by
google-cloud-
e.g.google-cloud-compute
for using virtual machines. - OpenTelemetry is an open standard for
observability with official packages for the core APIs and SDK with
contrib packages to collect data from various sources. All packages
are prefixed by
opentelemetry-
with child prefixes in the formopentelemetry-<component>-<name>-
. The contrib packages live in a central repository and they are the only ones with the ability to publish.
The second category is projects [2] that want to share their namespace such that some packages are officially maintained and third-party developers are encouraged to participate by publishing their own. Some examples:
- Project Jupyter is devoted to the development of
tooling for sharing interactive documents. They support extensions
which in most cases (and in all cases for officially maintained
extensions) are prefixed by
jupyter-
. - Django is one of the most widely used web
frameworks in existence. They have the concept of reusable apps, which
are commonly installed via
third-party packages that implement a subset
of functionality to extend Django-based websites. These packages are by
convention prefixed by
django-
ordj-
.
Such projects are uniquely vulnerable to name-squatting attacks which can ultimately result in dependency confusion.
For example, say a new product is released for which monitoring would be valuable. It would be reasonable to assume that Datadog would eventually support it as an official integration. It takes a nontrivial amount of time to deliver such an integration due to roadmap prioritization and the time required for implementation. It would be impossible to reserve the name of every potential package so in the interim an attacker may create a package that appears legitimate which would execute malicious code at runtime. Not only are users more likely to install such packages but doing so taints the perception of the entire project.
Although PEP 708 attempts to address this attack vector, it is specifically about the case of multiple repositories being considered during dependency resolution and does not offer any protection to the aforementioned use cases.
Namespacing also would drastically reduce the incidence of
typosquatting
because typos would have to be in the prefix itself which is
normalized and likely to be a short, well-known identifier like
aws-
. In recent years, typosquatting has become a popular attack vector
[4].
The current protection against typosquatting used by PyPI is to normalize similar characters but that is insufficient for these use cases.
Rationale
Other package ecosystems have generally solved this problem by taking one of two approaches: either minimizing or maximizing backwards compatibility.
- NPM has the concept of
scoped packages which were
introduced primarily to combat there being a dearth of available good
package names (whether a real or perceived phenomenon). When a user or
organization signs up they are given a scope that matches their name. For
example, the
package for using
Google Cloud Storage is
@google-cloud/storage
where@google-cloud/
is the scope. Regular user accounts (non-organization) may publish unscoped packages for public use. This approach has the lowest amount of backwards compatibility because every installer and tool has to be modified to account for scopes. - NuGet has the concept of package ID prefix reservation which was introduced primarily to satisfy users wishing to know where a package came from. A package name prefix may be reserved for use by one or more owners. Every reserved package has a special indication on its page to communicate this. After reservation, any upload with a reserved prefix will fail if the user is not an owner of the prefix. Existing packages that have a prefix that is owned may continue to release as usual. This approach has the highest amount of backwards compatibility because only modifications to indices like PyPI are required and installers do not need to change.
This PEP specifies the NuGet approach of authorized reservation across a flat namespace. Any solution that requires new package syntax must be built atop the existing flat namespace and therefore implicit namespaces acquired via a reservation mechanism would be a prerequisite to such explicit namespaces.
Although existing packages matching a reserved namespace would be untouched, preventing future unauthorized uploads and strategically applying PEP 541 takedown requests for malicious cases would reduce risks to users to a negligible level.
Terminology
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.
- Organization
- Organizations are entities that own projects and have various users associated with them.
- Grant
- A grant is a reservation of a namespace for a package repository.
- Open Namespace
- An open namespace allows for uploads from any project owner.
- Restricted Namespace
- A restricted namespace only allows uploads from an owner of the namespace.
- Parent Namespace
- A namespace’s parent refers to the namespace without the trailing
hyphenated component e.g. the parent of
foo-bar
isfoo
. - Child Namespace
- A namespace’s child refers to the namespace with additional trailing
hyphenated components e.g.
foo-bar
is a valid child offoo
as isfoo-bar-baz
.
Specification
Organizations
Any package repository that allows for the creation of projects (e.g. non-mirrors) MAY offer the concept of organizations. Organizations are entities that own projects and have various users associated with them.
Organizations MAY reserve one or more namespaces. Such reservations neither confer ownership nor grant special privileges to existing projects.
Naming
A namespace MUST be a valid project name and normalized internally e.g.
foo.bar
would become foo-bar
.
Semantics
A namespace grant bestows ownership over the following:
- A project matching the namespace itself such as the placeholder package microsoft.
- Projects that start with the namespace followed by a hyphen. For example,
the namespace
foo
would match the normalized project namefoo-bar
but not the project namefoobar
.
Package name matching acts upon the normalized namespace.
Namespaces are per-package repository and SHALL NOT be shared between
repositories. For example, if PyPI has a namespace microsoft
that is owned
by the company Microsoft, packages starting with microsoft-
that come from
other non-PyPI mirror repositories do not confer the same level of trust.
Grants MUST NOT overlap. For example, if there is an existing grant
for foo-bar
then a new grant for foo
would be forbidden. An overlap is
determined by comparing the normalized proposed namespace with the
normalized namespace of every existing root grant. Every comparison must append
a hyphen to the end of the proposed and existing namespace. An overlap is
detected when any existing namespace starts with the proposed namespace.
Uploads
If the following criteria are all true for a given upload:
- The project does not yet exist.
- The name matches a reserved namespace.
- The project is not owned by an organization with an active grant for the namespace.
Then the upload MUST fail with a 403 HTTP status code.
Open Namespaces
The owner of a grant may choose to allow others the ability to release new projects with the associated namespace. Doing so MUST allow uploads for new projects matching the namespace from any user.
It is possible for the owner of a namespace to both make it open and allow other organizations to use the grant. In this case, the authorized organizations have no special permissions and are equivalent to an open grant without ownership.
Repository Metadata
The JSON API version will be incremented from 1.2
to 1.3
.
The following API changes MUST be implemented by repositories that support
this PEP. Repositories that do not support this PEP MUST NOT implement these
changes so that consumers of the API are able to determine whether the
repository supports this PEP.
Project Detail
The project detail response will be modified as follows.
The namespace
key MUST be null
if the project does not match an active
namespace grant. If the project does match a namespace grant, the value MUST be
a mapping with the following keys:
prefix
: This is the associated normalized namespace e.g.foo-bar
. If the owner of the project owns multiple matching grants then this MUST be the namespace with the most number of characters. For example, if the project name matched bothfoo-bar
andfoo-bar-baz
then this key would be the latter.authorized
: This is a boolean and will be true if the project owner is an organization and is one of the current owners of the grant. This is useful for tools that wish to make a distinction between official and community packages.open
: This is a boolean indicating whether the namespace is open.
Namespace Detail
The format of this URL is /namespace/<namespace>
where <namespace>
is
the normalized namespace. For example, the URL for the namespace
foo.bar
would be /namespace/foo-bar
.
The response will be a mapping with the following keys:
prefix
: This is the normalized version of the namespace e.g.foo-bar
.owner
: This is the organization that is responsible for the namespace.open
: This is a boolean indicating whether the namespace is open.parent
: This is the parent namespace if it exists. For example, if the namespace isfoo-bar
and there is an active grant forfoo
, then this would be"foo"
. If there is no parent then this key will benull
.children
: This is an array of any child namespaces. For example, if the namespace isfoo
and there are active grants forfoo-bar
andfoo-bar-baz
then this would be["foo-bar", "foo-bar-baz"]
.
Grant Removal
When a reserved namespace becomes unclaimed, repositories MUST set the
namespace
key to null
in the API.
Namespaces that were previously claimed but are now not SHOULD be eligible for claiming again by any organization.
Community Buy-in
Representatives from the following organizations have expressed support for this PEP (with a link to the discussion):
Backwards Compatibility
There are no intrinsic concerns because there is still a flat namespace and installers need no modification. Additionally, many projects have already chosen to signal a shared purpose with a prefix like typeshed has done.
Security Implications
How to Teach This
For consumers of packages we will document how metadata is exposed in the API and potentially in future note tooling that supports utilizing namespaces to provide extra security guarantees during installation.
Reference Implementation
None at this time.
Rejected Ideas
Organization Scoping
The primary motivation for this PEP is to reduce dependency confusion attacks
and NPM-style scoping with an allowance of the legacy flat namespace would
increase the risk. If documentation instructed a user to install bar
in the
namespace foo
then the user must be careful to install @foo/bar
and not
foo-bar
, or vice versa. The Python packaging ecosystem has normalization
rules for names in order to maximize the ease of communication and this would
be a regression.
The runtime environment of Python is also not conducive to scoping. Whereas multiple versions of the same JavaScript package may coexist, Python only allows a single global namespace. Barring major changes to the language itself, this is nearly impossible to change. Additionally, users have come to expect that the package name is usually the same as what they would import and eliminating the flat namespace would do away with that convention.
Scoping would be particularly affected by organization changes which are bound to happen over time. An organization may change their name due to internal shuffling, an acquisition, or any other reason. Whenever this happens every project they own would in effect be renamed which would cause unnecessary confusion for users, frequently.
Finally, the disruption to the community would be massive because it would require an update from every package manager, security scanner, IDE, etc. New packages released with the scoping would be incompatible with older tools and would cause confusion for users along with frustration from maintainers having to triage such complaints.
Encourage Dedicated Package Repositories
Critically, this imposes a burden on projects to maintain their own infra. This is an unrealistic expectation for the vast majority of companies and a complete non-starter for community projects.
This does not help in most cases because the default behavior of most package
managers is to use PyPI so users attempting to perform a simple pip install
would already be vulnerable to malicious packages.
In this theoretical future every project must document how to add their repository to dependency resolution, which would be different for each package manager. Few package managers are able to download specific dependencies from specific repositories and would require users to use verbose configuration in the common case.
The ones that do not support this would instead find a given package using an
ordered enumeration of repositories, leading to dependency confusion.
For example, say a user wants two packages from two custom repositories X
and Y
. If each repository has both packages but one is malicious on X
and the other is malicious on Y
then the user would be unable to satisfy
their requirements without encountering a malicious package.
Use Fixed Prefixes
The idea here would be to have one or more top-level fixed prefixes that are used for namespace reservations:
com-
: Reserved for corporate organizations.org-
: Reserved for community organizations.
Organizations would then apply for a namespace prefixed by the type of their organization.
This would cause perpetual disruption because when projects begin it is unknown whether a user base will be large enough to warrant a namespace reservation. Whenever that happens the project would have to be renamed which would put a high maintenance burden on the project maintainers and would cause confusion for users who have to learn a new way to reference the project’s packages. The potential for this deterring projects from reserving namespaces at all is high.
Another issue with this approach is that projects often have branding in mind (example) and would be reluctant to change their package names.
It’s unrealistic to expect every company and project to voluntarily change their existing and future package names.
Use DNS
The idea here is to add a new
metadata field to projects in the API called domain-authority
. Repositories
would support a new endpoint for verifying the domain via HTTPS. Clients would
then support options to allow certain domains.
This does not solve the problem for the target audience who do not check where their packages are coming from and is more about checking for the integrity of uploads which is already supported in a more secure way by PEP 740.
Most projects do not have a domain and could not benefit from this, unfairly favoring organizations that have the financial means to acquire one.
Open Issues
None at this time.
Footnotes
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Source: https://github.com/python/peps/blob/main/peps/pep-0752.rst
Last modified: 2024-09-16 20:28:38 GMT