PEP: 527 Title: Removing Un(der)used file types/extensions on PyPI
Version: $Revision$ Last-Modified: $Date$ Author: Donald Stufft
<donald@stufft.io> BDFL-Delegate: Alyssa Coghlan <ncoghlan@gmail.com>
Discussions-To: distutils-sig@python.org Status: Final Type: Standards
Track Topic: Packaging Content-Type: text/x-rst Created: 23-Aug-2016
Post-History: 23-Aug-2016 Resolution:
https://mail.python.org/pipermail/distutils-sig/2016-September/029624.html

Abstract

This PEP recommends deprecating, and ultimately removing, support for
uploading certain unused or under used file types and extensions to
PyPI. In particular it recommends disallowing further uploads of any
files of the types bdist_dumb, bdist_rpm, bdist_dmg, bdist_msi, and
bdist_wininst, leaving PyPI to only accept new uploads of the sdist,
bdist_wheel, and bdist_egg file types.

In addition, this PEP proposes removing support for new uploads of
sdists using the .tar, .tar.bz2, .tar.xz, .tar.Z, .tgz, .tbz, and any
other extension besides .tar.gz and .zip.

Finally, this PEP also proposes limiting the number of allowed sdist
uploads for each individual release of a project on PyPI to one instead
of one for each allowed extension.

Rationale

File Formats

Currently PyPI supports the following file types:

-   sdist
-   bdist_wheel
-   bdist_egg
-   bdist_wininst
-   bdist_msi
-   bdist_dmg
-   bdist_rpm
-   bdist_dumb

However, these different types of files have varying amounts of
usefulness or general use in the ecosystem. Continuing to support them
adds a maintenance burden on PyPI as well as tool authors and incurs a
cost in both bandwidth and disk space not only on PyPI itself, but also
on any mirrors of PyPI.

Python packaging is a multi-level ecosystem where PyPI is primarily
suited and used to distribute virtual environment compatible packages
directly from their respective project owners. These packages are then
consumed either directly by end-users, or by downstream distributors
that take these packages and turn them into their respective system
level packages (such as RPM, deb, MSI, etc).

While PyPI itself only directly works with these Python specific but
platform agnostic packages, we encourage community-driven and commercial
conversions of these packages to downstream formats for particular
target environments, like:

-   The conda cross-platform data analysis ecosystem (conda-forge)
-   The deb based Linux ecosystem (Debian, Ubuntu, etc)
-   The RPM based Linux ecosystem (Fedora, openSuSE, Mageia, etc)
-   The homebrew, MacPorts and fink ecosystems for Mac OS X
-   The Windows Package Management ecosystem (NuGet, Chocolatey, etc)
-   3rd party creation of Windows MSIs and installers (e.g. Christoph
    Gohlke's work at http://www.lfd.uci.edu/~gohlke/pythonlibs/ )
-   other commercial redistribution formats (ActiveState's PyPM,
    Enthought Canopy, etc)
-   other open source community redistribution formats (Nix, Gentoo,
    Arch, *BSD, etc)

It is the belief of this PEP that the entire ecosystem is best supported
by keeping PyPI focused on the platform agnostic formats, where the
limited amount of time by volunteers can be best used instead of
spreading the available time out amongst several platforms. Further
more, this PEP believes that the people best positioned to provide well
integrated packages for a particular platform are people focused on that
platform, and not across all possible platforms.

bdist_dumb

As it's name implies, bdist_dumb is not a very complex format, however
it is so simple as to be worthless for actual usage.

For instance, if you're using something like pyenv on macOS and you're
building a library using Python 3.5, then bdist_dumb will produce a
.tar.gz file named something like
exampleproject-1.0.macosx-10.11-x86_64.tar.gz. Right off the bat this
file name is somewhat difficult to differentiate from an sdist since
they both use the same file extension (and with the legacy pre PEP 440
versions, 1.0-macosx-10.11-x86_64 is a valid, although quite silly,
version number). However, once you open up the created .tar.gz, you'd
find that there is no metadata inside that could be used for things like
dependency discovery and in fact, it is quite simply a tarball
containing hardcoded paths to wherever files would have been installed
on the computer creating the bdist_dumb. Going back to our pyenv on
macOS example, this means that if I created it, it would contain files
like:

Users/dstufft/.pyenv/versions/3.5.2/lib/python3.5/site-packages/example.py

bdist_rpm

The bdist_rpm format on PyPI allows people to upload .rpm files for end
users to manually download by hand and then manually install by hand.
However, the common usage of rpm is with a specially designed repository
that allows automatic installation of dependencies, upgrades, etc which
PyPI does not provide. Thus, it is a type of file that is barely being
used on PyPI with only ~460 files of this type having been uploaded to
PyPI (out a total of 662,544).

In addition, services like COPR provide a better supported mechanism for
publishing and using RPM files than we're ever likely to get on PyPI.

bdist_dmg, bdist_msi, and bdist_wininst

The bdist_dmg, bdist_msi, and bdist_winist formats are similar in that
they are an OS specific installer that will only install a library into
an environment and are not designed for real user facing installs of
applications (which would require things like bundling a Python
interpreter and the like).

Out of these three, the usage for bdist_dmg and bdist_msi is very low,
with only ~500 bdist_msi files and ~50 bdist_dmg files having been
uploaded to PyPI. The bdist_wininst format has more use, with ~14,000
files having ever been uploaded to PyPI.

It's quite easy to look at the low usage of bdist_dmg and bdist_msi and
conclude that removing them will be fairly low impact, however
bdist_wininst has several orders of magnitude more usage. This is
somewhat misleading though, because although it has more people
uploading those files the actual usage of those uploaded files is fairly
low. Taking a look at the previous 30 days, we can see that 90% of all
downloads of bdist_winist files from PyPI were generated by the
mirroring infrastructure and 7% of them were generated by setuptools
(which can currently be better covered by bdist_egg files).

Given the small number of files uploaded for bdist_dmg and bdist_msi and
that bdist_wininst is largely existing to either consume bandwidth and
disk space via the mirroring infrastructure or could be trivially
replaced with bdist_egg, this PEP proposes to include these three
formats in the list of those to be disallowed.

File Extensions

Currently sdist supports a wide variety of file extensions like .tar.gz,
.tar, .tar.bz2, .tar.xz, .zip, .tar.Z, .tgz, and .tbz. However, of those
the only extensions which get anything more than negligible usage is
.tar.gz with 444,338 sdists currently, .zip with 58,774 sdists
currently, and .tar.bz2 with 3,265 sdists currently.

Having multiple formats accepted requires tooling both within PyPI and
outside of PyPI to handle all of the various extensions that might be
used (even if nobody is currently using them). This doesn't only affect
PyPI, but ripples out throughout the ecosystem. In addition, the
different formats all have different requirements for what optional C
libraries Python was linked against and different requirements for what
versions of Python they support. In addition, multiple formats also
create a weird situation where there may be two sdist files for a
particular project/release with subtly different content.

It's easy to advocate that anything outside of .tar.gz, .zip, and
.tar.bz2 should be disallowed. Outside of a tiny handful, nobody has
actively been uploading these other types of files in the ~15 years of
PyPI's existence so they've obviously not been particularly useful. In
addition, while .tar.xz is theoretically a nicer format than the other
.tar.* formats due to the better compression ratio achieved by LZMA, it
is only available in Python 3.3+ and has an optional dependency on the
lzma C library.

Looking at the three extensions we do have in current use, it's also
fairly easy to conclude that .tar.bz2 can be disallowed as well. It has
a fairly small number of files ever uploaded with it and it requires an
additional optional C library to handle the bzip2 compression.

Finally we get down to .tar.gz and .zip. Looking at the pure numbers for
these two, we can see that .tar.gz is by far the most uploaded format,
with 444,338 total uploaded compared to .zip's 58,774 and on POSIX
operating systems .tar.gz is also the default produced by all currently
released versions of Python and setuptools. In addition, these two file
types both use the same C library (zlib) which is also required for
bdist_wheel and bdist_egg. The two wrinkles with deciding between
.tar.gz and .zip is that while on POSIX operating systems .tar.gz is the
default, on Windows .zip is the default and the bdist_wheel format also
uses zip.

Instead of trying to standardize on either .tar.gz or .zip, this PEP
proposes that we allow either .tar.gz or .zip for sdists.

Limiting number of sdists per release

A sdist on PyPI should be a single source of truth for a particular
release of software. However, currently PyPI allows you to upload one
sdist for each of the sdist file extensions it allows. Currently this
allows something like 10 different sdists for a project, but even with
this PEP it allows two different sources of truth for a single version.
Having multiple sdists oftentimes can account for strange bugs that only
expose themselves based on which sdist that the person used.

To resolve this, this PEP proposes to allow one, and only one, sdist per
release of a project.

Removal Process

This PEP does NOT propose removing any existing files from PyPI, only
disallowing new ones from being uploaded. This restriction will be
phased in on a per-project basis to allow projects to adjust to the new
restrictions where applicable.

First, any existing projects will be flagged to allow legacy file types
to be uploaded, and any project without that flag (i.e. new projects)
will not be able to upload anything but sdist with a .tar.gz or .zip
extension, bdist_wheel, and bdist_egg. Then, any existing projects that
have never uploaded a file that requires the legacy file type flag will
have that flag removed, also making them fall under the new
restrictions. Finally, an email will be generated to the maintainers of
all projects still given the legacy flag, which will inform them of the
upcoming new restrictions on uploads and tell them that these
restrictions will be applied to future uploads to their projects
starting in 1 month. Finally, after 1 month all projects will have the
legacy file type flag removed, and support for uploading these types of
files will cease to exist on PyPI.

This plan should provide minimal disruption since it does not remove any
existing files, and the types of files it does prevent from being
uploaded are either not particularly useful (or used) types of files or
they can continue to upload a similar type of file with a slight change
to their process.

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End: