PEP: 382 Title: Namespace Packages Author: Martin von Löwis
<martin@v.loewis.de> Status: Rejected Type: Standards Track
Content-Type: text/x-rst Created: 02-Apr-2009 Python-Version: 3.2
Post-History:

Rejection Notice

On the first day of sprints at US PyCon 2012 we had a long and fruitful
discussion about PEP 382 and PEP 402. We ended up rejecting both but a
new PEP will be written to carry on in the spirit of PEP 402. Martin von
Löwis wrote up a summary:[1].

Abstract

Namespace packages are a mechanism for splitting a single Python package
across multiple directories on disk. In current Python versions, an
algorithm to compute the packages __path__ must be formulated. With the
enhancement proposed here, the import machinery itself will construct
the list of directories that make up the package. An implementation of
this PEP is available at[2].

Terminology

Within this PEP, the term package refers to Python packages as defined
by Python's import statement. The term distribution refers to separately
installable sets of Python modules as stored in the Python package
index, and installed by distutils or setuptools. The term vendor package
refers to groups of files installed by an operating system's packaging
mechanism (e.g. Debian or Redhat packages install on Linux systems).

The term portion refers to a set of files in a single directory
(possibly stored in a zip file) that contribute to a namespace package.

Namespace packages today

Python currently provides the pkgutil.extend_path to denote a package as
a namespace package. The recommended way of using it is to put:

    from pkgutil import extend_path
    __path__ = extend_path(__path__, __name__)

in the package's __init__.py. Every distribution needs to provide the
same contents in its __init__.py, so that extend_path is invoked
independent of which portion of the package gets imported first. As a
consequence, the package's __init__.py cannot practically define any
names as it depends on the order of the package fragments on sys.path
which portion is imported first. As a special feature, extend_path reads
files named <packagename>.pkg which allow to declare additional
portions.

setuptools provides a similar function pkg_resources.declare_namespace
that is used in the form:

    import pkg_resources
    pkg_resources.declare_namespace(__name__)

In the portion's __init__.py, no assignment to __path__ is necessary, as
declare_namespace modifies the package __path__ through sys.modules. As
a special feature, declare_namespace also supports zip files, and
registers the package name internally so that future additions to
sys.path by setuptools can properly add additional portions to each
package.

setuptools allows declaring namespace packages in a distribution's
setup.py, so that distribution developers don't need to put the magic
__path__ modification into __init__.py themselves.

Rationale

The current imperative approach to namespace packages has lead to
multiple slightly-incompatible mechanisms for providing namespace
packages. For example, pkgutil supports *.pkg files; setuptools doesn't.
Likewise, setuptools supports inspecting zip files, and supports adding
portions to its _namespace_packages variable, whereas pkgutil doesn't.

In addition, the current approach causes problems for system vendors.
Vendor packages typically must not provide overlapping files, and an
attempt to install a vendor package that has a file already on disk will
fail or cause unpredictable behavior. As vendors might chose to package
distributions such that they will end up all in a single directory for
the namespace package, all portions would contribute conflicting
__init__.py files.

Specification

Rather than using an imperative mechanism for importing packages, a
declarative approach is proposed here: A directory whose name ends with
.pyp (for Python package) contains a portion of a package.

The import statement is extended so that computes the package's __path__
attribute for a package named P as consisting of optionally a single
directory name P containing a file __init__.py, plus all directories
named P.pyp, in the order in which they are found in the parent's
package __path__ (or sys.path). If either of these are found, search for
additional portions of the package continues.

A directory may contain both a package in the P/__init__.py and the
P.pyp form.

No other change to the importing mechanism is made; searching modules
(including __init__.py) will continue to stop at the first module
encountered. In summary, the process import a package foo works like
this:

1.  sys.path is searched for directories foo or foo.pyp, or a file
    foo.<ext>. If a file is found and no directory, it is treated as a
    module, and imported.
2.  If a directory foo is found, a check is made whether it contains
    __init__.py. If so, the location of the __init__.py is remembered.
    Otherwise, the directory is skipped. Once an __init__.py is found,
    further directories called foo are skipped.
3.  For both directories foo and foo.pyp, the directories are added to
    the package's __path__.
4.  If an __init__ module was found, it is imported, with __path__ being
    initialized to the path computed all .pyp directories.

Impact on Import Hooks

Both loaders and finders as defined in PEP 302 will need to be changed
to support namespace packages. Failure to conform to the protocol below
might cause a package not being recognized as a namespace package;
loaders and finders not supporting this protocol must raise
AttributeError when the functions below get accessed.

Finders need to support looking for *.pth files in step 1 of above
algorithm. To do so, a finder used as a path hook must support a method:

  finder.find_package_portion(fullname)

This method will be called in the same manner as find_module, and it
must return a string to be added to the package's __path__. If the
finder doesn't find a portion of the package, it shall return None.
Raising AttributeError from above call will be treated as
non-conformance with this PEP, and the exception will be ignored. All
other exceptions are reported.

A finder may report both success from find_module and from
find_package_portion, allowing for both a package containing an
__init__.py and a portion of the same package.

All strings returned from find_package_portion, along with all path
names of .pyp directories are added to the new package's __path__.

Discussion

Original versions of this specification proposed the addition of *.pth
files, similar to the way those files are used on sys.path. With a
wildcard marker (*), a package could indicate that the entire path is
derived by looking at the parent path, searching for properly-named
subdirectories.

People then observed that the support for the full .pth syntax is
inappropriate, and the .pth files were changed to be mere marker files,
indicating that a directories is a package. Peter Tröger suggested that
.pth is an unsuitable file extension, as all file extensions related to
Python should start with .py. Therefore, the marker file was renamed to
be .pyp.

Dinu Gherman then observed that using a marker file is not necessary,
and that a directory extension could well serve as a such as a marker.
This is what this PEP currently proposes.

Phillip Eby designed PEP 402 as an alternative approach to this PEP,
after comparing Python's package syntax with that found in other
languages. PEP 402 proposes not to use a marker file at all. At the
discussion at PyCon DE 2011, people remarked that having an explicit
declaration of a directory as contributing to a package is a desirable
property, rather than an obstacle. In particular, Jython developers
noticed that Jython could easily mistake a directory that is a Java
package as being a Python package, if there is no need to declare Python
packages.

Packages can stop filling out the namespace package's __init__.py. As a
consequence, extend_path and declare_namespace become obsolete.

Namespace packages can start providing non-trivial __init__.py
implementations; to do so, it is recommended that a single distribution
provides a portion with just the namespace package's __init__.py (and
potentially other modules that belong to the namespace package proper).

The mechanism is mostly compatible with the existing namespace
mechanisms. extend_path will be adjusted to this specification; any
other mechanism might cause portions to get added twice to __path__.

References

Copyright

This document has been placed in the public domain.

[1] Namespace Packages resolution
(https://mail.python.org/pipermail/import-sig/2012-March/000421.html)

[2] PEP 382 branch (http://hg.python.org/features/pep-382-2#pep-382)