PEP: 420 Title: Implicit Namespace Packages Version: $Revision$
Last-Modified: $Date$ Author: Eric V. Smith <eric@trueblade.com> Status:
Final Type: Standards Track Content-Type: text/x-rst Created:
19-Apr-2012 Python-Version: 3.3 Post-History: Resolution:
https://mail.python.org/pipermail/python-dev/2012-May/119651.html

Abstract

Namespace packages are a mechanism for splitting a single Python package
across multiple directories on disk. In current Python versions, an
algorithm to compute the packages __path__ must be formulated. With the
enhancement proposed here, the import machinery itself will construct
the list of directories that make up the package. This PEP builds upon
previous work, documented in PEP 382 and PEP 402. Those PEPs have since
been rejected in favor of this one. An implementation of this PEP is
at[1].

Terminology

Within this PEP:

-   "package" refers to Python packages as defined by Python's import
    statement.
-   "distribution" refers to separately installable sets of Python
    modules as stored in the Python package index, and installed by
    distutils or setuptools.
-   "vendor package" refers to groups of files installed by an operating
    system's packaging mechanism (e.g. Debian or Redhat packages install
    on Linux systems).
-   "regular package" refers to packages as they are implemented in
    Python 3.2 and earlier.
-   "portion" refers to a set of files in a single directory (possibly
    stored in a zip file) that contribute to a namespace package.
-   "legacy portion" refers to a portion that uses __path__ manipulation
    in order to implement namespace packages.

This PEP defines a new type of package, the "namespace package".

Namespace packages today

Python currently provides pkgutil.extend_path to denote a package as a
namespace package. The recommended way of using it is to put:

    from pkgutil import extend_path
    __path__ = extend_path(__path__, __name__)

in the package's __init__.py. Every distribution needs to provide the
same contents in its __init__.py, so that extend_path is invoked
independent of which portion of the package gets imported first. As a
consequence, the package's __init__.py cannot practically define any
names as it depends on the order of the package fragments on sys.path to
determine which portion is imported first. As a special feature,
extend_path reads files named <packagename>.pkg which allows declaration
of additional portions.

setuptools provides a similar function named
pkg_resources.declare_namespace that is used in the form:

    import pkg_resources
    pkg_resources.declare_namespace(__name__)

In the portion's __init__.py, no assignment to __path__ is necessary, as
declare_namespace modifies the package __path__ through sys.modules. As
a special feature, declare_namespace also supports zip files, and
registers the package name internally so that future additions to
sys.path by setuptools can properly add additional portions to each
package.

setuptools allows declaring namespace packages in a distribution's
setup.py, so that distribution developers don't need to put the magic
__path__ modification into __init__.py themselves.

See PEP 402's "The Problem" <402#the-problem> section for additional
motivations for namespace packages. Note that PEP 402 has been rejected,
but the motivating use cases are still valid.

Rationale

The current imperative approach to namespace packages has led to
multiple slightly-incompatible mechanisms for providing namespace
packages. For example, pkgutil supports *.pkg files; setuptools doesn't.
Likewise, setuptools supports inspecting zip files, and supports adding
portions to its _namespace_packages variable, whereas pkgutil doesn't.

Namespace packages are designed to support being split across multiple
directories (and hence found via multiple sys.path entries). In this
configuration, it doesn't matter if multiple portions all provide an
__init__.py file, so long as each portion correctly initializes the
namespace package. However, Linux distribution vendors (amongst others)
prefer to combine the separate portions and install them all into the
same file system directory. This creates a potential for conflict, as
the portions are now attempting to provide the same file on the target
system - something that is not allowed by many package managers.
Allowing implicit namespace packages means that the requirement to
provide an __init__.py file can be dropped completely, and affected
portions can be installed into a common directory or split across
multiple directories as distributions see fit.

A namespace package will not be constrained by a fixed __path__,
computed from the parent path at namespace package creation time.
Consider the standard library encodings package:

1.  Suppose that encodings becomes a namespace package.
2.  It sometimes gets imported during interpreter startup to initialize
    the standard io streams.
3.  An application modifies sys.path after startup and wants to
    contribute additional encodings from new path entries.
4.  An attempt is made to import an encoding from an encodings portion
    that is found on a path entry added in step 3.

If the import system was restricted to only finding portions along the
value of sys.path that existed at the time the encodings namespace
package was created, the additional paths added in step 3 would never be
searched for the additional portions imported in step 4. In addition, if
step 2 were sometimes skipped (due to some runtime flag or other
condition), then the path items added in step 3 would indeed be used the
first time a portion was imported. Thus this PEP requires that the list
of path entries be dynamically computed when each portion is loaded. It
is expected that the import machinery will do this efficiently by
caching __path__ values and only refreshing them when it detects that
the parent path has changed. In the case of a top-level package like
encodings, this parent path would be sys.path.

Specification

Regular packages will continue to have an __init__.py and will reside in
a single directory.

Namespace packages cannot contain an __init__.py. As a consequence,
pkgutil.extend_path and pkg_resources.declare_namespace become obsolete
for purposes of namespace package creation. There will be no marker file
or directory for specifying a namespace package.

During import processing, the import machinery will continue to iterate
over each directory in the parent path as it does in Python 3.2. While
looking for a module or package named "foo", for each directory in the
parent path:

-   If <directory>/foo/__init__.py is found, a regular package is
    imported and returned.
-   If not, but <directory>/foo.{py,pyc,so,pyd} is found, a module is
    imported and returned. The exact list of extension varies by
    platform and whether the -O flag is specified. The list here is
    representative.
-   If not, but <directory>/foo is found and is a directory, it is
    recorded and the scan continues with the next directory in the
    parent path.
-   Otherwise the scan continues with the next directory in the parent
    path.

If the scan completes without returning a module or package, and at
least one directory was recorded, then a namespace package is created.
The new namespace package:

-   Has a __path__ attribute set to an iterable of the path strings that
    were found and recorded during the scan.
-   Does not have a __file__ attribute.

Note that if "import foo" is executed and "foo" is found as a namespace
package (using the above rules), then "foo" is immediately created as a
package. The creation of the namespace package is not deferred until a
sub-level import occurs.

A namespace package is not fundamentally different from a regular
package. It is just a different way of creating packages. Once a
namespace package is created, there is no functional difference between
it and a regular package.

Dynamic path computation

The import machinery will behave as if a namespace package's __path__ is
recomputed before each portion is loaded.

For performance reasons, it is expected that this will be achieved by
detecting that the parent path has changed. If no change has taken
place, then no __path__ recomputation is required. The implementation
must ensure that changes to the contents of the parent path are
detected, as well as detecting the replacement of the parent path with a
new path entry list object.

Impact on import finders and loaders

PEP 302 defines "finders" that are called to search path elements. These
finders' find_module methods return either a "loader" object or None.

For a finder to contribute to namespace packages, it must implement a
new find_loader(fullname) method. fullname has the same meaning as for
find_module. find_loader always returns a 2-tuple of
(loader, <iterable-of-path-entries>). loader may be None, in which case
<iterable-of-path-entries> (which may be empty) is added to the list of
recorded path entries and path searching continues. If loader is not
None, it is immediately used to load a module or regular package.

Even if loader is returned and is not None, <iterable-of-path-entries>
must still contain the path entries for the package. This allows code
such as pkgutil.extend_path() to compute path entries for packages that
it does not load.

Note that multiple path entries per finder are allowed. This is to
support the case where a finder discovers multiple namespace portions
for a given fullname. Many finders will support only a single namespace
package portion per find_loader call, in which case this iterable will
contain only a single string.

The import machinery will call find_loader if it exists, else fall back
to find_module. Legacy finders which implement find_module but not
find_loader will be unable to contribute portions to a namespace
package.

The specification expands PEP 302 loaders to include an optional method
called module_repr() which if present, is used to generate module object
reprs. See the section below for further details.

Differences between namespace packages and regular packages

Namespace packages and regular packages are very similar. The
differences are:

-   Portions of namespace packages need not all come from the same
    directory structure, or even from the same loader. Regular packages
    are self-contained: all parts live in the same directory hierarchy.
-   Namespace packages have no __file__ attribute.
-   Namespace packages' __path__ attribute is a read-only iterable of
    strings, which is automatically updated when the parent path is
    modified.
-   Namespace packages have no __init__.py module.
-   Namespace packages have a different type of object for their
    __loader__ attribute.

Namespace packages in the standard library

It is possible, and this PEP explicitly allows, that parts of the
standard library be implemented as namespace packages. When and if any
standard library packages become namespace packages is outside the scope
of this PEP.

Migrating from legacy namespace packages

As described above, prior to this PEP pkgutil.extend_path() was used by
legacy portions to create namespace packages. Because it is likely not
practical for all existing portions of a namespace package to be
migrated to this PEP at once, extend_path() will be modified to also
recognize PEP 420 namespace packages. This will allow some portions of a
namespace to be legacy portions while others are migrated to PEP 420.
These hybrid namespace packages will not have the dynamic path
computation that normal namespace packages have, since extend_path()
never provided this functionality in the past.

Packaging Implications

Multiple portions of a namespace package can be installed into the same
directory, or into separate directories. For this section, suppose there
are two portions which define "foo.bar" and "foo.baz". "foo" itself is a
namespace package.

If these are installed in the same location, a single directory "foo"
would be in a directory that is on sys.path. Inside "foo" would be two
directories, "bar" and "baz". If "foo.bar" is removed (perhaps by an OS
package manager), care must be taken not to remove the "foo/baz" or
"foo" directories. Note that in this case "foo" will be a namespace
package (because it lacks an __init__.py), even though all of its
portions are in the same directory.

Note that "foo.bar" and "foo.baz" can be installed into the same "foo"
directory because they will not have any files in common.

If the portions are installed in different locations, two different
"foo" directories would be in directories that are on sys.path.
"foo/bar" would be in one of these sys.path entries, and "foo/baz" would
be in the other. Upon removal of "foo.bar", the "foo/bar" and
corresponding "foo" directories can be completely removed. But "foo/baz"
and its corresponding "foo" directory cannot be removed.

It is also possible to have the "foo.bar" portion installed in a
directory on sys.path, and have the "foo.baz" portion provided in a zip
file, also on sys.path.

Examples

Nested namespace packages

This example uses the following directory structure:

    Lib/test/namespace_pkgs
        project1
            parent
                child
                    one.py
        project2
            parent
                child
                    two.py

Here, both parent and child are namespace packages: Portions of them
exist in different directories, and they do not have __init__.py files.

Here we add the parent directories to sys.path, and show that the
portions are correctly found:

    >>> import sys
    >>> sys.path += ['Lib/test/namespace_pkgs/project1', 'Lib/test/namespace_pkgs/project2']
    >>> import parent.child.one
    >>> parent.__path__
    _NamespacePath(['Lib/test/namespace_pkgs/project1/parent', 'Lib/test/namespace_pkgs/project2/parent'])
    >>> parent.child.__path__
    _NamespacePath(['Lib/test/namespace_pkgs/project1/parent/child', 'Lib/test/namespace_pkgs/project2/parent/child'])
    >>> import parent.child.two
    >>>

Dynamic path computation

This example uses a similar directory structure, but adds a third
portion:

    Lib/test/namespace_pkgs
        project1
            parent
                child
                    one.py
        project2
            parent
                child
                    two.py
        project3
            parent
                child
                    three.py

We add project1 and project2 to sys.path, then import parent.child.one
and parent.child.two. Then we add the project3 to sys.path and when
parent.child.three is imported, project3/parent is automatically added
to parent.__path__:

    # add the first two parent paths to sys.path
    >>> import sys
    >>> sys.path += ['Lib/test/namespace_pkgs/project1', 'Lib/test/namespace_pkgs/project2']

    # parent.child.one can be imported, because project1 was added to sys.path:
    >>> import parent.child.one
    >>> parent.__path__
    _NamespacePath(['Lib/test/namespace_pkgs/project1/parent', 'Lib/test/namespace_pkgs/project2/parent'])

    # parent.child.__path__ contains project1/parent/child and project2/parent/child, but not project3/parent/child:
    >>> parent.child.__path__
    _NamespacePath(['Lib/test/namespace_pkgs/project1/parent/child', 'Lib/test/namespace_pkgs/project2/parent/child'])

    # parent.child.two can be imported, because project2 was added to sys.path:
    >>> import parent.child.two

    # we cannot import parent.child.three, because project3 is not in the path:
    >>> import parent.child.three
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<frozen importlib._bootstrap>", line 1286, in _find_and_load
      File "<frozen importlib._bootstrap>", line 1250, in _find_and_load_unlocked
    ImportError: No module named 'parent.child.three'

    # now add project3 to sys.path:
    >>> sys.path.append('Lib/test/namespace_pkgs/project3')

    # and now parent.child.three can be imported:
    >>> import parent.child.three

    # project3/parent has been added to parent.__path__:
    >>> parent.__path__
    _NamespacePath(['Lib/test/namespace_pkgs/project1/parent', 'Lib/test/namespace_pkgs/project2/parent', 'Lib/test/namespace_pkgs/project3/parent'])

    # and project3/parent/child has been added to parent.child.__path__
    >>> parent.child.__path__
    _NamespacePath(['Lib/test/namespace_pkgs/project1/parent/child', 'Lib/test/namespace_pkgs/project2/parent/child', 'Lib/test/namespace_pkgs/project3/parent/child'])
    >>>

Discussion

At PyCon 2012, we had a discussion about namespace packages at which PEP
382 and PEP 402 were rejected, to be replaced by this PEP[2].

There is no intention to remove support of regular packages. If a
developer knows that her package will never be a portion of a namespace
package, then there is a performance advantage to it being a regular
package (with an __init__.py). Creation and loading of a regular package
can take place immediately when it is located along the path. With
namespace packages, all entries in the path must be scanned before the
package is created.

Note that an ImportWarning will no longer be raised for a directory
lacking an __init__.py file. Such a directory will now be imported as a
namespace package, whereas in prior Python versions an ImportWarning
would be raised.

Alyssa (Nick) Coghlan presented a list of her objections to this
proposal[3]. They are:

1.  Implicit package directories go against the Zen of Python.
2.  Implicit package directories pose awkward backwards compatibility
    challenges.
3.  Implicit package directories introduce ambiguity into file system
    layouts.
4.  Implicit package directories will permanently entrench current
    newbie-hostile behavior in __main__.

Alyssa later gave a detailed response to her own objections[4], which is
summarized here:

1.  The practicality of this PEP wins over other proposals and the
    status quo.
2.  Minor backward compatibility issues are okay, as long as they are
    properly documented.
3.  This will be addressed in PEP 395.
4.  This will also be addressed in PEP 395.

The inclusion of namespace packages in the standard library was
motivated by Martin v. Löwis, who wanted the encodings package to become
a namespace package[5]. While this PEP allows for standard library
packages to become namespaces, it defers a decision on encodings.

find_module versus find_loader

An early draft of this PEP specified a change to the find_module method
in order to support namespace packages. It would be modified to return a
string in the case where a namespace package portion was discovered.

However, this caused a problem with existing code outside of the
standard library which calls find_module. Because this code would not be
upgraded in concert with changes required by this PEP, it would fail
when it would receive unexpected return values from find_module. Because
of this incompatibility, this PEP now specifies that finders that want
to provide namespace portions must implement the find_loader method,
described above.

The use case for supporting multiple portions per find_loader call is
given in[6].

Dynamic path computation

Guido raised a concern that automatic dynamic path computation was an
unnecessary feature[7]. Later in that thread, PJ Eby and Alyssa Coghlan
presented arguments as to why dynamic computation would minimize
surprise to Python users. The conclusion of that discussion has been
included in this PEP's Rationale section.

An earlier version of this PEP required that dynamic path computation
could only take affect if the parent path object were modified in-place.
That is, this would work:

    sys.path.append('new-dir')

But this would not:

    sys.path = sys.path + ['new-dir']

In the same thread[8], it was pointed out that this restriction is not
required. If the parent path is looked up by name instead of by holding
a reference to it, then there is no restriction on how the parent path
is modified or replaced. For a top-level namespace package, the lookup
would be the module named "sys" then its attribute "path". For a
namespace package nested inside a package foo, the lookup would be for
the module named "foo" then its attribute "__path__".

Module reprs

Previously, module reprs were hard coded based on assumptions about a
module's __file__ attribute. If this attribute existed and was a string,
it was assumed to be a file system path, and the module object's repr
would include this in its value. The only exception was that PEP 302
reserved missing __file__ attributes to built-in modules, and in
CPython, this assumption was baked into the module object's
implementation. Because of this restriction, some modules contained
contrived __file__ values that did not reflect file system paths, and
which could cause unexpected problems later (e.g. os.path.join() on a
non-path __file__ would return gibberish).

This PEP relaxes this constraint, and leaves the setting of __file__ to
the purview of the loader producing the module. Loaders may opt to leave
__file__ unset if no file system path is appropriate. Loaders may also
set additional reserved attributes on the module if useful. This means
that the definitive way to determine the origin of a module is to check
its __loader__ attribute.

For example, namespace packages as described in this PEP will have no
__file__ attribute because no corresponding file exists. In order to
provide flexibility and descriptiveness in the reprs of such modules, a
new optional protocol is added to PEP 302 loaders. Loaders can implement
a module_repr() method which takes a single argument, the module object.
This method should return the string to be used verbatim as the repr of
the module. The rules for producing a module repr are now standardized
as:

-   If the module has an __loader__ and that loader has a module_repr()
    method, call it with a single argument, which is the module object.
    The value returned is used as the module's repr.
-   If an exception occurs in module_repr(), the exception is caught and
    discarded, and the calculation of the module's repr continues as if
    module_repr() did not exist.
-   If the module has an __file__ attribute, this is used as part of the
    module's repr.
-   If the module has no __file__ but does have an __loader__, then the
    loader's repr is used as part of the module's repr.
-   Otherwise, just use the module's __name__ in the repr.

Here is a snippet showing how namespace module reprs are calculated from
its loader:

    class NamespaceLoader:
        @classmethod
        def module_repr(cls, module):
            return "<module '{}' (namespace)>".format(module.__name__)

Built-in module reprs would no longer need to be hard-coded, but instead
would come from their loader as well:

    class BuiltinImporter:
        @classmethod
        def module_repr(cls, module):
            return "<module '{}' (built-in)>".format(module.__name__)

Here are some example reprs of different types of modules with different
sets of the related attributes:

    >>> import email
    >>> email
    <module 'email' from '/home/barry/projects/python/pep-420/Lib/email/__init__.py'>
    >>> m = type(email)('foo')
    >>> m
    <module 'foo'>
    >>> m.__file__ = 'zippy:/de/do/dah'
    >>> m
    <module 'foo' from 'zippy:/de/do/dah'>
    >>> class Loader: pass
    ...
    >>> m.__loader__ = Loader
    >>> del m.__file__
    >>> m
    <module 'foo' (<class '__main__.Loader'>)>
    >>> class NewLoader:
    ...   @classmethod
    ...   def module_repr(cls, module):
    ...      return '<mystery module!>'
    ...
    >>> m.__loader__ = NewLoader
    >>> m
    <mystery module!>
    >>>

References

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

[1] PEP 420 branch (http://hg.python.org/features/pep-420)

[2] PyCon 2012 Namespace Package discussion outcome
(https://mail.python.org/pipermail/import-sig/2012-March/000421.html)

[3] Alyssa Coghlan's objection to the lack of marker files or
directories
(https://mail.python.org/pipermail/import-sig/2012-March/000423.html)

[4] Alyssa Coghlan's response to her initial objections
(https://mail.python.org/pipermail/import-sig/2012-April/000464.html)

[5] Martin v. Löwis's suggestion to make encodings a namespace package
(https://mail.python.org/pipermail/import-sig/2012-May/000540.html)

[6] Use case for multiple portions per find_loader call
(https://mail.python.org/pipermail/import-sig/2012-May/000585.html)

[7] Discussion about dynamic path computation
(https://mail.python.org/pipermail/python-dev/2012-May/119560.html)

[8] Discussion about dynamic path computation
(https://mail.python.org/pipermail/python-dev/2012-May/119560.html)