PEP: 376 Title: Database of Installed Python Distributions Author: Tarek
Ziadé <tarek@ziade.org> Status: Final Type: Standards Track Topic:
Packaging Content-Type: text/x-rst Created: 22-Feb-2009 Python-Version:
2.7, 3.2 Post-History: 22-Jun-2009

packaging:core-metadata

Abstract

The goal of this PEP is to provide a standard infrastructure to manage
project distributions installed on a system, so all tools that are
installing or removing projects are interoperable.

To achieve this goal, the PEP proposes a new format to describe
installed distributions on a system. It also describes a reference
implementation for the standard library.

In the past an attempt was made to create an installation database (see
PEP 262).

Combined with PEP 345, the current proposal supersedes PEP 262.

Note: the implementation plan didn't go as expected, so it should be
considered informative only for this PEP.

Rationale

There are two problems right now in the way distributions are installed
in Python:

-   There are too many ways to do it and this makes interoperation
    difficult.
-   There is no API to get information on installed distributions.

How distributions are installed

Right now, when a distribution is installed in Python, every element can
be installed in a different directory.

For instance, Distutils installs the pure Python code in the purelib
directory, which is lib/python2.6/site-packages for unix-like systems
and Mac OS X, or Lib\site-packages under Python's installation directory
for Windows.

Additionally, the install_egg_info subcommand of the Distutils install
command adds an .egg-info file for the project into the purelib
directory.

For example, for the docutils distribution, which contains one package
an extra module and executable scripts, three elements are installed in
site-packages:

-   docutils: The docutils package.
-   roman.py: An extra module used by docutils.
-   docutils-0.5-py2.6.egg-info: A file containing the distribution
    metadata as described in PEP 314. This file corresponds to the file
    called PKG-INFO, built by the sdist command.

Some executable scripts, such as rst2html.py, are also added in the bin
directory of the Python installation.

Another project called setuptools[1] has two other formats to install
distributions, called EggFormats[2]:

-   a self-contained .egg directory, that contains all the distribution
    files and the distribution metadata in a file called PKG-INFO in a
    subdirectory called EGG-INFO. setuptools creates other files in that
    directory that can be considered as complementary metadata.
-   an .egg-info directory installed in site-packages, that contains the
    same files EGG-INFO has in the .egg format.

The first format is automatically used when you install a distribution
that uses the setuptools.setup function in its setup.py file, instead of
the distutils.core.setup one.

setuptools also add a reference to the distribution into an
easy-install.pth file.

Last, the setuptools project provides an executable script called
easy_install[3] that installs all distributions, including
distutils-based ones in self-contained .egg directories.

If you want to have standalone .egg-info directories for your
distributions, e.g. the second setuptools format, you have to force it
when you work with a setuptools-based distribution or with the
easy_install script. You can force it by using the
--single-version-externally-managed option or the --root option. This
will make the setuptools project install the project like distutils
does.

This option is used by :

-   the pip[4] installer
-   the Fedora packagers[5].
-   the Debian packagers[6].

Uninstall information

Distutils doesn't provide an uninstall command. If you want to uninstall
a distribution, you have to be a power user and remove the various
elements that were installed, and then look over the .pth file to clean
them if necessary.

And the process differs depending on the tools you have used to install
the distribution and if the distribution's setup.py uses Distutils or
Setuptools.

Under some circumstances, you might not be able to know for sure that
you have removed everything, or that you didn't break another
distribution by removing a file that is shared among several
distributions.

But there's a common behavior: when you install a distribution, files
are copied in your system. And it's possible to keep track of these
files for later removal.

Moreover, the Pip project has gained an uninstall feature lately. It
records all installed files, using the record option of the install
command.

What this PEP proposes

To address those issues, this PEP proposes a few changes:

-   A new .dist-info structure using a directory, inspired on one format
    of the EggFormats standard from setuptools.
-   New APIs in pkgutil to be able to query the information of installed
    distributions.
-   An uninstall function and an uninstall script in Distutils.

One .dist-info directory per installed distribution

This PEP proposes an installation format inspired by one of the options
in the EggFormats standard, the one that uses a distinct directory
located in the site-packages directory.

This distinct directory is named as follows:

    name + '-' + version + '.dist-info'

This .dist-info directory can contain these files:

-   METADATA: contains metadata, as described in PEP 345, PEP 314 and
    PEP 241.
-   RECORD: records the list of installed files
-   INSTALLER: records the name of the tool used to install the project
-   REQUESTED: the presence of this file indicates that the project
    installation was explicitly requested (i.e., not installed as a
    dependency).

The METADATA, RECORD and INSTALLER files are mandatory, while REQUESTED
may be missing.

This proposal will not impact Python itself because the metadata files
are not used anywhere yet in the standard library besides Distutils.

It will impact the setuptools and pip projects but, given the fact that
they already work with a directory that contains a PKG-INFO file, the
change will have no deep consequences.

RECORD

A RECORD file is added inside the .dist-info directory at installation
time when installing a source distribution using the install command.
Notice that when installing a binary distribution created with bdist
command or a bdist-based command, the RECORD file will be installed as
well since these commands use the install command to create binary
distributions.

The RECORD file holds the list of installed files. These correspond to
the files listed by the record option of the install command, and will
be generated by default. This allows the implementation of an
uninstallation feature, as explained later in this PEP. The install
command also provides an option to prevent the RECORD file from being
written and this option should be used when creating system packages.

Third-party installation tools also should not overwrite or delete files
that are not in a RECORD file without prompting or warning.

This RECORD file is inspired from PEP 262 FILES.

The RECORD file is a CSV file, composed of records, one line per
installed file. The csv module is used to read the file, with these
options:

-   field delimiter : ,
-   quoting char : ".
-   line terminator : os.linesep (so \r\n or \n)

When a distribution is installed, files can be installed under:

-   the base location: path defined by the --install-lib option, which
    defaults to the site-packages directory.
-   the installation prefix: path defined by the --prefix option, which
    defaults to sys.prefix.
-   any other path on the system.

Each record is composed of three elements:

-   the file's path

    -   a '/'-separated path, relative to the base location, if the file
        is under the base location.
    -   a '/'-separated path, relative to the base location, if the file
        is under the installation prefix AND if the base location is a
        subpath of the installation prefix.
    -   an absolute path, using the local platform separator

-   a hash of the file's contents. Notice that pyc and pyo generated
    files don't have any hash because they are automatically produced
    from py files. So checking the hash of the corresponding py file is
    enough to decide if the file and its associated pyc or pyo files
    have changed.

    The hash is either the empty string or the hash algorithm as named
    in hashlib.algorithms_guaranteed, followed by the equals character
    =, followed by the urlsafe-base64-nopad encoding of the digest
    (base64.urlsafe_b64encode(digest) with trailing = removed).

-   the file's size in bytes

The csv module is used to generate this file, so the field separator is
",". Any "," character found within a field is escaped automatically by
csv.

When the file is read, the U option is used so the universal newline
support (see PEP 278) is activated, avoiding any trouble reading a file
produced on a platform that uses a different new line terminator.

Here's an example of a RECORD file (extract):

    lib/python2.6/site-packages/docutils/__init__.py,md5=nWt-Dge1eug4iAgqLS_uWg,9544
    lib/python2.6/site-packages/docutils/__init__.pyc,,
    lib/python2.6/site-packages/docutils/core.py,md5=X90C_JLIcC78PL74iuhPnA,66188
    lib/python2.6/site-packages/docutils/core.pyc,,
    lib/python2.6/site-packages/roman.py,md5=7YhfNczihNjOY0FXlupwBg,234
    lib/python2.6/site-packages/roman.pyc,,
    /usr/local/bin/rst2html.py,md5=g22D3amDLJP-FhBzCi7EvA,234
    /usr/local/bin/rst2html.pyc,,
    python2.6/site-packages/docutils-0.5.dist-info/METADATA,md5=ovJyUNzXdArGfmVyb0onyA,195
    lib/python2.6/site-packages/docutils-0.5.dist-info/RECORD,,

Notice that the RECORD file can't contain a hash of itself and is just
mentioned here

A project that installs a config.ini file in /etc/myapp will be added
like this:

    /etc/myapp/config.ini,md5=gLfd6IANquzGLhOkW4Mfgg,9544

For a windows platform, the drive letter is added for the absolute
paths, so a file that is copied in c:MyAppwill be:

    c:\etc\myapp\config.ini,md5=gLfd6IANquzGLhOkW4Mfgg,9544

INSTALLER

The install command has a new option called installer. This option is
the name of the tool used to invoke the installation. It's a normalized
lower-case string matching [a-z0-9_\-\.].

  $ python setup.py install --installer=pkg-system

It defaults to distutils if not provided.

When a distribution is installed, the INSTALLER file is generated in the
.dist-info directory with this value, to keep track of who installed the
distribution. The file is a single-line text file.

REQUESTED

Some install tools automatically detect unfulfilled dependencies and
install them. In these cases, it is useful to track which distributions
were installed purely as a dependency, so if their dependent
distribution is later uninstalled, the user can be alerted of the
orphaned dependency.

If a distribution is installed by direct user request (the usual case),
a file REQUESTED is added to the .dist-info directory of the installed
distribution. The REQUESTED file may be empty, or may contain a marker
comment line beginning with the "#" character.

If an install tool installs a distribution automatically, as a
dependency of another distribution, the REQUESTED file should not be
created.

The install command of distutils by default creates the REQUESTED file.
It accepts --requested and --no-requested options to explicitly specify
whether the file is created.

If a distribution that was already installed on the system as a
dependency is later installed by name, the distutils install command
will create the REQUESTED file in the .dist-info directory of the
existing installation.

Implementation details

Note: this section is non-normative. In the end, this PEP was
implemented by third-party libraries and tools, not the standard
library.

New functions and classes in pkgutil

To use the .dist-info directory content, we need to add in the standard
library a set of APIs. The best place to put these APIs is pkgutil.

Functions

The new functions added in the pkgutil module are :

-   distinfo_dirname(name, version) -> directory name

      name is converted to a standard distribution name by replacing any
      runs of non-alphanumeric characters with a single '-'.

      version is converted to a standard version string. Spaces become
      dots, and all other non-alphanumeric characters (except dots)
      become dashes, with runs of multiple dashes condensed to a single
      dash.

      Both attributes are then converted into their filename-escaped
      form, i.e. any '-' characters are replaced with '_' other than the
      one in 'dist-info' and the one separating the name from the
      version number.

-   get_distributions() -> iterator of Distribution instances.

    Provides an iterator that looks for .dist-info directories in
    sys.path and returns Distribution instances for each one of them.

-   get_distribution(name) -> Distribution or None.

-   obsoletes_distribution(name, version=None) -> iterator of
    Distribution instances.

    Iterates over all distributions to find which distributions obsolete
    name. If a version is provided, it will be used to filter the
    results.

-   provides_distribution(name, version=None) -> iterator of
    Distribution instances.

    Iterates over all distributions to find which distributions provide
    name. If a version is provided, it will be used to filter the
    results. Scans all elements in sys.path and looks for all
    directories ending with .dist-info. Returns a Distribution
    corresponding to the .dist-info directory that contains a METADATA
    that matches name for the name metadata.

    This function only returns the first result founded, since no more
    than one values are expected. If the directory is not found, returns
    None.

-   get_file_users(path) -> iterator of Distribution instances.

    Iterates over all distributions to find out which distributions uses
    path. path can be a local absolute path or a relative '/'-separated
    path.

    A local absolute path is an absolute path in which occurrences of
    '/' have been replaced by the system separator given by os.sep.

Distribution class

A new class called Distribution is created with the path of the
.dist-info directory provided to the constructor. It reads the metadata
contained in METADATA when it is instantiated.

Distribution(path) -> instance

  Creates a Distribution instance for the given path.

Distribution provides the following attributes:

-   name: The name of the distribution.
-   metadata: A DistributionMetadata instance loaded with the
    distribution's METADATA file.
-   requested: A boolean that indicates whether the REQUESTED metadata
    file is present (in other words, whether the distribution was
    installed by user request).

And following methods:

-   get_installed_files(local=False) -> iterator of (path, hash, size)

    Iterates over the RECORD entries and return a tuple
    (path, hash, size) for each line. If local is True, the path is
    transformed into a local absolute path. Otherwise the raw value from
    RECORD is returned.

    A local absolute path is an absolute path in which occurrences of
    '/' have been replaced by the system separator given by os.sep.

-   uses(path) -> Boolean

    Returns True if path is listed in RECORD. path can be a local
    absolute path or a relative '/'-separated path.

-   get_distinfo_file(path, binary=False) -> file object

      Returns a file located under the .dist-info directory.

      Returns a file instance for the file pointed by path.

      path has to be a '/'-separated path relative to the .dist-info
      directory or an absolute path.

      If path is an absolute path and doesn't start with the .dist-info
      directory path, a DistutilsError is raised.

      If binary is True, opens the file in read-only binary mode (rb),
      otherwise opens it in read-only mode (r).

-   get_distinfo_files(local=False) -> iterator of paths

    Iterates over the RECORD entries and returns paths for each line if
    the path is pointing to a file located in the .dist-info directory
    or one of its subdirectories.

    If local is True, each path is transformed into a local absolute
    path. Otherwise the raw value from RECORD is returned.

Notice that the API is organized in five classes that work with
directories and Zip files (so it works with files included in Zip files,
see PEP 273 for more details). These classes are described in the
documentation of the prototype implementation for interested readers[7].

Examples

Let's use some of the new APIs with our docutils example:

    >>> from pkgutil import get_distribution, get_file_users, distinfo_dirname
    >>> dist = get_distribution('docutils')
    >>> dist.name
    'docutils'
    >>> dist.metadata.version
    '0.5'

    >>> distinfo_dirname('docutils', '0.5')
    'docutils-0.5.dist-info'

    >>> distinfo_dirname('python-ldap', '2.5')
    'python_ldap-2.5.dist-info'

    >>> distinfo_dirname('python-ldap', '2.5 a---5')
    'python_ldap-2.5.a_5.dist-info'

    >>> for path, hash, size in dist.get_installed_files()::
    ...     print '%s %s %d' % (path, hash, size)
    ...
    python2.6/site-packages/docutils/__init__.py,b690274f621402dda63bf11ba5373bf2,9544
    python2.6/site-packages/docutils/core.py,9c4b84aff68aa55f2e9bf70481b94333,66188
    python2.6/site-packages/roman.py,a4b84aff68aa55f2e9bf70481b943D3,234
    /usr/local/bin/rst2html.py,a4b84aff68aa55f2e9bf70481b943D3,234
    python2.6/site-packages/docutils-0.5.dist-info/METADATA,6fe57de576d749536082d8e205b77748,195
    python2.6/site-packages/docutils-0.5.dist-info/RECORD

    >>> dist.uses('docutils/core.py')
    True

    >>> dist.uses('/usr/local/bin/rst2html.py')
    True

    >>> dist.get_distinfo_file('METADATA')
    <open file at ...>

    >>> dist.requested
    True

New functions in Distutils

Distutils already provides a very basic way to install a distribution,
which is running the install command over the setup.py script of the
distribution.

Distutils2 <262> will provide a very basic uninstall function, that is
added in distutils2.util and takes the name of the distribution to
uninstall as its argument. uninstall uses the APIs described earlier and
remove all unique files, as long as their hash didn't change. Then it
removes empty directories left behind.

uninstall returns a list of uninstalled files:

    >>> from distutils2.util import uninstall
    >>> uninstall('docutils')
    ['/opt/local/lib/python2.6/site-packages/docutils/core.py',
     ...
     '/opt/local/lib/python2.6/site-packages/docutils/__init__.py']

If the distribution is not found, a DistutilsUninstallError is raised.

Filtering

To make it a reference API for third-party projects that wish to control
how uninstall works, a second callable argument can be used. It's called
for each file that is removed. If the callable returns True, the file is
removed. If it returns False, it's left alone.

Examples:

    >>> def _remove_and_log(path):
    ...     logging.info('Removing %s' % path)
    ...     return True
    ...
    >>> uninstall('docutils', _remove_and_log)

    >>> def _dry_run(path):
    ...     logging.info('Removing %s (dry run)' % path)
    ...     return False
    ...
    >>> uninstall('docutils', _dry_run)

Of course, a third-party tool can use lower-level pkgutil APIs to
implement its own uninstall feature.

Installer marker

As explained earlier in this PEP, the install command adds an INSTALLER
file in the .dist-info directory with the name of the installer.

To avoid removing distributions that were installed by another packaging
system, the uninstall function takes an extra argument installer which
defaults to distutils2.

When called, uninstall controls that the INSTALLER file matches this
argument. If not, it raises a DistutilsUninstallError:

    >>> uninstall('docutils')
    Traceback (most recent call last):
    ...
    DistutilsUninstallError: docutils was installed by 'cool-pkg-manager'

    >>> uninstall('docutils', installer='cool-pkg-manager')

This allows a third-party application to use the uninstall function and
strongly suggest that no other program remove a distribution it has
previously installed. This is useful when a third-party program that
relies on Distutils APIs does extra steps on the system at installation
time, it has to undo at uninstallation time.

Adding an Uninstall script

An uninstall script is added in Distutils2. and is used like this:

    $ python -m distutils2.uninstall projectname

Notice that script doesn't control if the removal of a distribution
breaks another distribution. Although it makes sure that all the files
it removes are not used by any other distribution, by using the
uninstall function.

Also note that this uninstall script pays no attention to the REQUESTED
metadata; that is provided only for use by external tools to provide
more advanced dependency management.

Backward compatibility and roadmap

These changes don't introduce any compatibility problems since they will
be implemented in:

-   pkgutil in new functions
-   distutils2

The plan is to include the functionality outlined in this PEP in pkgutil
for Python 3.2, and in Distutils2.

Distutils2 will also contain a backport of the new pgkutil, and can be
used for 2.4 onward.

Distributions installed using existing, pre-standardization formats do
not have the necessary metadata available for the new API, and thus will
be ignored. Third-party tools may of course to continue to support
previous formats in addition to the new format, in order to ease the
transition.

References

Acknowledgements

Jim Fulton, Ian Bicking, Phillip Eby, Rafael Villar Burke, and many
people at Pycon and Distutils-SIG.

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

[1] http://peak.telecommunity.com/DevCenter/setuptools

[2] http://peak.telecommunity.com/DevCenter/EggFormats

[3] http://peak.telecommunity.com/DevCenter/EasyInstall

[4] http://pypi.python.org/pypi/pip

[5] http://fedoraproject.org/wiki/Packaging/Python/Eggs#Providing_Eggs_using_Setuptools

[6] http://wiki.debian.org/DebianPython/NewPolicy

[7] http://bitbucket.org/tarek/pep376/