PEP: 395 Title: Qualified Names for Modules Version: $Revision$
Last-Modified: $Date$ Author: Alyssa Coghlan <ncoghlan@gmail.com>
Status: Withdrawn Type: Standards Track Content-Type: text/x-rst
Created: 04-Mar-2011 Python-Version: 3.4 Post-History: 05-Mar-2011,
19-Nov-2011

PEP Withdrawal

This PEP was withdrawn by the author in December 2013, as other
significant changes in the time since it was written have rendered
several aspects obsolete. Most notably PEP 420 namespace packages
rendered some of the proposals related to package detection unworkable
and PEP 451 module specifications resolved the multiprocessing issues
and provide a possible means to tackle the pickle compatibility issues.

A future PEP to resolve the remaining issues would still be appropriate,
but it's worth starting any such effort as a fresh PEP restating the
remaining problems in an updated context rather than trying to build on
this one directly.

Abstract

This PEP proposes new mechanisms that eliminate some longstanding traps
for the unwary when dealing with Python's import system, as well as
serialisation and introspection of functions and classes.

It builds on the "Qualified Name" concept defined in PEP 3155.

Relationship with Other PEPs

Most significantly, this PEP is currently deferred as it requires
significant changes in order to be made compatible with the removal of
mandatory __init__.py files in PEP 420 (which has been implemented and
released in Python 3.3).

This PEP builds on the "qualified name" concept introduced by PEP 3155,
and also shares in that PEP's aim of fixing some ugly corner cases when
dealing with serialisation of arbitrary functions and classes.

It also builds on PEP 366, which took initial tentative steps towards
making explicit relative imports from the main module work correctly in
at least some circumstances.

Finally, PEP 328 eliminated implicit relative imports from imported
modules. This PEP proposes that the de facto implicit relative imports
from main modules that are provided by the current initialisation
behaviour for sys.path[0] also be eliminated.

What's in a __name__?

Over time, a module's __name__ attribute has come to be used to handle a
number of different tasks.

The key use cases identified for this module attribute are:

1.  Flagging the main module in a program, using the
    if __name__ == "__main__": convention.
2.  As the starting point for relative imports
3.  To identify the location of function and class definitions within
    the running application
4.  To identify the location of classes for serialisation into pickle
    objects which may be shared with other interpreter instances

Traps for the Unwary

The overloading of the semantics of __name__, along with some
historically associated behaviour in the initialisation of sys.path[0],
has resulted in several traps for the unwary. These traps can be quite
annoying in practice, as they are highly unobvious (especially to
beginners) and can cause quite confusing behaviour.

Why are my imports broken?

There's a general principle that applies when modifying sys.path: never
put a package directory directly on sys.path. The reason this is
problematic is that every module in that directory is now potentially
accessible under two different names: as a top level module (since the
package directory is on sys.path) and as a submodule of the package (if
the higher level directory containing the package itself is also on
sys.path).

As an example, Django (up to and including version 1.3) is guilty of
setting up exactly this situation for site-specific applications - the
application ends up being accessible as both app and site.app in the
module namespace, and these are actually two different copies of the
module. This is a recipe for confusion if there is any meaningful
mutable module level state, so this behaviour is being eliminated from
the default site set up in version 1.4 (site-specific apps will always
be fully qualified with the site name).

However, it's hard to blame Django for this, when the same part of
Python responsible for setting __name__ = "__main__" in the main module
commits the exact same error when determining the value for sys.path[0].

The impact of this can be seen relatively frequently if you follow the
"python" and "import" tags on Stack Overflow. When I had the time to
follow it myself, I regularly encountered people struggling to
understand the behaviour of straightforward package layouts like the
following (I actually use package layouts along these lines in my own
projects):

    project/
        setup.py
        example/
            __init__.py
            foo.py
            tests/
                __init__.py
                test_foo.py

While I would often see it without the __init__.py files first, that's a
trivial fix to explain. What's hard to explain is that all of the
following ways to invoke test_foo.py probably won't work due to broken
imports (either failing to find example for absolute imports,
complaining about relative imports in a non-package or beyond the
toplevel package for explicit relative imports, or issuing even more
obscure errors if some other submodule happens to shadow the name of a
top-level module, such as an example.json module that handled
serialisation or an example.tests.unittest test runner):

    # These commands will most likely *FAIL*, even if the code is correct

    # working directory: project/example/tests
    ./test_foo.py
    python test_foo.py
    python -m package.tests.test_foo
    python -c "from package.tests.test_foo import main; main()"

    # working directory: project/package
    tests/test_foo.py
    python tests/test_foo.py
    python -m package.tests.test_foo
    python -c "from package.tests.test_foo import main; main()"

    # working directory: project
    example/tests/test_foo.py
    python example/tests/test_foo.py

    # working directory: project/..
    project/example/tests/test_foo.py
    python project/example/tests/test_foo.py
    # The -m and -c approaches don't work from here either, but the failure
    # to find 'package' correctly is easier to explain in this case

That's right, that long list is of all the methods of invocation that
will almost certainly break if you try them, and the error messages
won't make any sense if you're not already intimately familiar not only
with the way Python's import system works, but also with how it gets
initialised.

For a long time, the only way to get sys.path right with that kind of
setup was to either set it manually in test_foo.py itself (hardly
something a novice, or even many veteran, Python programmers are going
to know how to do) or else to make sure to import the module instead of
executing it directly:

    # working directory: project
    python -c "from package.tests.test_foo import main; main()"

Since the implementation of PEP 366 (which defined a mechanism that
allows relative imports to work correctly when a module inside a package
is executed via the -m switch), the following also works properly:

    # working directory: project
    python -m package.tests.test_foo

The fact that most methods of invoking Python code from the command line
break when that code is inside a package, and the two that do work are
highly sensitive to the current working directory is all thoroughly
confusing for a beginner. I personally believe it is one of the key
factors leading to the perception that Python packages are complicated
and hard to get right.

This problem isn't even limited to the command line - if test_foo.py is
open in Idle and you attempt to run it by pressing F5, or if you try to
run it by clicking on it in a graphical filebrowser, then it will fail
in just the same way it would if run directly from the command line.

There's a reason the general "no package directories on sys.path"
guideline exists, and the fact that the interpreter itself doesn't
follow it when determining sys.path[0] is the root cause of all sorts of
grief.

In the past, this couldn't be fixed due to backwards compatibility
concerns. However, scripts potentially affected by this problem will
already require fixes when porting to the Python 3.x (due to the
elimination of implicit relative imports when importing modules
normally). This provides a convenient opportunity to implement a
corresponding change in the initialisation semantics for sys.path[0].

Importing the main module twice

Another venerable trap is the issue of importing __main__ twice. This
occurs when the main module is also imported under its real name,
effectively creating two instances of the same module under different
names.

If the state stored in __main__ is significant to the correct operation
of the program, or if there is top-level code in the main module that
has non-idempotent side effects, then this duplication can cause obscure
and surprising errors.

In a bit of a pickle

Something many users may not realise is that the pickle module sometimes
relies on the __module__ attribute when serialising instances of
arbitrary classes. So instances of classes defined in __main__ are
pickled that way, and won't be unpickled correctly by another python
instance that only imported that module instead of running it directly.
This behaviour is the underlying reason for the advice from many Python
veterans to do as little as possible in the __main__ module in any
application that involves any form of object serialisation and
persistence.

Similarly, when creating a pseudo-module (see next paragraph), pickles
rely on the name of the module where a class is actually defined, rather
than the officially documented location for that class in the module
hierarchy.

For the purposes of this PEP, a "pseudo-module" is a package designed
like the Python 3.2 unittest and concurrent.futures packages. These
packages are documented as if they were single modules, but are in fact
internally implemented as a package. This is supposed to be an
implementation detail that users and other implementations don't need to
worry about, but, thanks to pickle (and serialisation in general), the
details are often exposed and can effectively become part of the public
API.

While this PEP focuses specifically on pickle as the principal
serialisation scheme in the standard library, this issue may also affect
other mechanisms that support serialisation of arbitrary class instances
and rely on __module__ attributes to determine how to handle
deserialisation.

Where's the source?

Some sophisticated users of the pseudo-module technique described above
recognise the problem with implementation details leaking out via the
pickle module, and choose to address it by altering __name__ to refer to
the public location for the module before defining any functions or
classes (or else by modifying the __module__ attributes of those objects
after they have been defined).

This approach is effective at eliminating the leakage of information via
pickling, but comes at the cost of breaking introspection for functions
and classes (as their __module__ attribute now points to the wrong
place).

Forkless Windows

To get around the lack of os.fork on Windows, the multiprocessing module
attempts to re-execute Python with the same main module, but skipping
over any code guarded by if __name__ == "__main__": checks. It does the
best it can with the information it has, but is forced to make
assumptions that simply aren't valid whenever the main module isn't an
ordinary directly executed script or top-level module. Packages and
non-top-level modules executed via the -m switch, as well as directly
executed zipfiles or directories, are likely to make multiprocessing on
Windows do the wrong thing (either quietly or noisily, depending on
application details) when spawning a new process.

While this issue currently only affects Windows directly, it also
impacts any proposals to provide Windows-style "clean process"
invocation via the multiprocessing module on other platforms.

Qualified Names for Modules

To make it feasible to fix these problems once and for all, it is
proposed to add a new module level attribute: __qualname__. This
abbreviation of "qualified name" is taken from PEP 3155, where it is
used to store the naming path to a nested class or function definition
relative to the top level module.

For modules, __qualname__ will normally be the same as __name__, just as
it is for top-level functions and classes in PEP 3155. However, it will
differ in some situations so that the above problems can be addressed.

Specifically, whenever __name__ is modified for some other purpose (such
as to denote the main module), then __qualname__ will remain unchanged,
allowing code that needs it to access the original unmodified value.

If a module loader does not initialise __qualname__ itself, then the
import system will add it automatically (setting it to the same value as
__name__).

Alternative Names

Two alternative names were also considered for the new attribute: "full
name" (__fullname__) and "implementation name" (__implname__).

Either of those would actually be valid for the use case in this PEP.
However, as a meta-issue, PEP 3155 is also adding a new attribute (for
functions and classes) that is "like __name__, but different in some
cases where __name__ is missing necessary information" and those terms
aren't accurate for the PEP 3155 function and class use case.

PEP 3155 deliberately omits the module information, so the term "full
name" is simply untrue, and "implementation name" implies that it may
specify an object other than that specified by __name__, and that is
never the case for PEP 3155 (in that PEP, __name__ and __qualname__
always refer to the same function or class, it's just that __name__ is
insufficient to accurately identify nested functions and classes).

Since it seems needlessly inconsistent to add two new terms for
attributes that only exist because backwards compatibility concerns keep
us from changing the behaviour of __name__ itself, this PEP instead
chose to adopt the PEP 3155 terminology.

If the relative inscrutability of "qualified name" and __qualname__
encourages interested developers to look them up at least once rather
than assuming they know what they mean just from the name and guessing
wrong, that's not necessarily a bad outcome.

Besides, 99% of Python developers should never need to even care these
extra attributes exist - they're really an implementation detail to let
us fix a few problematic behaviours exhibited by imports, pickling and
introspection, not something people are going to be dealing with on a
regular basis.

Eliminating the Traps

The following changes are interrelated and make the most sense when
considered together. They collectively either completely eliminate the
traps for the unwary noted above, or else provide straightforward
mechanisms for dealing with them.

A rough draft of some of the concepts presented here was first posted on
the python-ideas list ([1]), but they have evolved considerably since
first being discussed in that thread. Further discussion has
subsequently taken place on the import-sig mailing list ([2].[3]).

Fixing main module imports inside packages

To eliminate this trap, it is proposed that an additional filesystem
check be performed when determining a suitable value for sys.path[0].
This check will look for Python's explicit package directory markers and
use them to find the appropriate directory to add to sys.path.

The current algorithm for setting sys.path[0] in relevant cases is
roughly as follows:

    # Interactive prompt, -m switch, -c switch
    sys.path.insert(0, '')

    # Valid sys.path entry execution (i.e. directory and zip execution)
    sys.path.insert(0, sys.argv[0])

    # Direct script execution
    sys.path.insert(0, os.path.dirname(sys.argv[0]))

It is proposed that this initialisation process be modified to take
package details stored on the filesystem into account:

    # Interactive prompt, -m switch, -c switch
    in_package, path_entry, _ignored = split_path_module(os.getcwd(), '')
    if in_package:
        sys.path.insert(0, path_entry)
    else:
        sys.path.insert(0, '')

    # Start interactive prompt or run -c command as usual
    #   __main__.__qualname__ is set to "__main__"

    # The -m switches uses the same sys.path[0] calculation, but:
    #   modname is the argument to the -m switch
    #   modname is passed to ``runpy._run_module_as_main()`` as usual
    #   __main__.__qualname__ is set to modname

    # Valid sys.path entry execution (i.e. directory and zip execution)
    modname = "__main__"
    path_entry, modname = split_path_module(sys.argv[0], modname)
    sys.path.insert(0, path_entry)

    # modname (possibly adjusted) is passed to ``runpy._run_module_as_main()``
    # __main__.__qualname__ is set to modname

    # Direct script execution
    in_package, path_entry, modname = split_path_module(sys.argv[0])
    sys.path.insert(0, path_entry)
    if in_package:
        # Pass modname to ``runpy._run_module_as_main()``
    else:
        # Run script directly
    # __main__.__qualname__ is set to modname

The split_path_module() supporting function used in the above
pseudo-code would have the following semantics:

    def _splitmodname(fspath):
        path_entry, fname = os.path.split(fspath)
        modname = os.path.splitext(fname)[0]
        return path_entry, modname

    def _is_package_dir(fspath):
        return any(os.exists("__init__" + info[0]) for info
                       in imp.get_suffixes())

    def split_path_module(fspath, modname=None):
        """Given a filesystem path and a relative module name, determine an
           appropriate sys.path entry and a fully qualified module name.

           Returns a 3-tuple of (package_depth, fspath, modname). A reported
           package depth of 0 indicates that this would be a top level import.

           If no relative module name is given, it is derived from the final
           component in the supplied path with the extension stripped.
        """
        if modname is None:
            fspath, modname = _splitmodname(fspath)
        package_depth = 0
        while _is_package_dir(fspath):
            fspath, pkg = _splitmodname(fspath)
            modname = pkg + '.' + modname
        return package_depth, fspath, modname

This PEP also proposes that the split_path_module() functionality be
exposed directly to Python users via the runpy module.

With this fix in place, and the same simple package layout described
earlier, all of the following commands would invoke the test suite
correctly:

    # working directory: project/example/tests
    ./test_foo.py
    python test_foo.py
    python -m package.tests.test_foo
    python -c "from .test_foo import main; main()"
    python -c "from ..tests.test_foo import main; main()"
    python -c "from package.tests.test_foo import main; main()"

    # working directory: project/package
    tests/test_foo.py
    python tests/test_foo.py
    python -m package.tests.test_foo
    python -c "from .tests.test_foo import main; main()"
    python -c "from package.tests.test_foo import main; main()"

    # working directory: project
    example/tests/test_foo.py
    python example/tests/test_foo.py
    python -m package.tests.test_foo
    python -c "from package.tests.test_foo import main; main()"

    # working directory: project/..
    project/example/tests/test_foo.py
    python project/example/tests/test_foo.py
    # The -m and -c approaches still don't work from here, but the failure
    # to find 'package' correctly is pretty easy to explain in this case

With these changes, clicking Python modules in a graphical file browser
should always execute them correctly, even if they live inside a
package. Depending on the details of how it invokes the script, Idle
would likely also be able to run test_foo.py correctly with F5, without
needing any Idle specific fixes.

Optional addition: command line relative imports

With the above changes in place, it would be a fairly minor addition to
allow explicit relative imports as arguments to the -m switch:

    # working directory: project/example/tests
    python -m .test_foo
    python -m ..tests.test_foo

    # working directory: project/example/
    python -m .tests.test_foo

With this addition, system initialisation for the -m switch would change
as follows:

    # -m switch (permitting explicit relative imports)
    in_package, path_entry, pkg_name = split_path_module(os.getcwd(), '')
    qualname= <<arguments to -m switch>>
    if qualname.startswith('.'):
        modname = qualname
        while modname.startswith('.'):
            modname = modname[1:]
            pkg_name, sep, _ignored = pkg_name.rpartition('.')
            if not sep:
                raise ImportError("Attempted relative import beyond top level package")
        qualname = pkg_name + '.' modname
    if in_package:
        sys.path.insert(0, path_entry)
    else:
        sys.path.insert(0, '')

    # qualname is passed to ``runpy._run_module_as_main()``
    # _main__.__qualname__ is set to qualname

Compatibility with PEP 382

Making this proposal compatible with the PEP 382 namespace packaging PEP
is trivial. The semantics of _is_package_dir() are merely changed to be:

    def _is_package_dir(fspath):
        return (fspath.endswith(".pyp") or
                any(os.exists("__init__" + info[0]) for info
                        in imp.get_suffixes()))

Incompatibility with PEP 402

PEP 402 proposes the elimination of explicit markers in the file system
for Python packages. This fundamentally breaks the proposed concept of
being able to take a filesystem path and a Python module name and work
out an unambiguous mapping to the Python module namespace. Instead, the
appropriate mapping would depend on the current values in sys.path,
rendering it impossible to ever fix the problems described above with
the calculation of sys.path[0] when the interpreter is initialised.

While some aspects of this PEP could probably be salvaged if PEP 402
were adopted, the core concept of making import semantics from main and
other modules more consistent would no longer be feasible.

This incompatibility is discussed in more detail in the relevant
import-sig threads ([4],[5]).

Potential incompatibilities with scripts stored in packages

The proposed change to sys.path[0] initialisation may break some
existing code. Specifically, it will break scripts stored in package
directories that rely on the implicit relative imports from __main__ in
order to run correctly under Python 3.

While such scripts could be imported in Python 2 (due to implicit
relative imports) it is already the case that they cannot be imported in
Python 3, as implicit relative imports are no longer permitted when a
module is imported.

By disallowing implicit relatives imports from the main module as well,
such modules won't even work as scripts with this PEP. Switching them
over to explicit relative imports will then get them working again as
both executable scripts and as importable modules.

To support earlier versions of Python, a script could be written to use
different forms of import based on the Python version:

    if __name__ == "__main__" and sys.version_info < (3, 3):
        import peer # Implicit relative import
    else:
        from . import peer # explicit relative import

Fixing dual imports of the main module

Given the above proposal to get __qualname__ consistently set correctly
in the main module, one simple change is proposed to eliminate the
problem of dual imports of the main module: the addition of a
sys.metapath hook that detects attempts to import __main__ under its
real name and returns the original main module instead:

    class AliasImporter:
      def __init__(self, module, alias):
          self.module = module
          self.alias = alias

      def __repr__(self):
          fmt = "{0.__class__.__name__}({0.module.__name__}, {0.alias})"
          return fmt.format(self)

      def find_module(self, fullname, path=None):
          if path is None and fullname == self.alias:
              return self
          return None

      def load_module(self, fullname):
          if fullname != self.alias:
              raise ImportError("{!r} cannot load {!r}".format(self, fullname))
          return self.main_module

This metapath hook would be added automatically during import system
initialisation based on the following logic:

    main = sys.modules["__main__"]
    if main.__name__ != main.__qualname__:
        sys.metapath.append(AliasImporter(main, main.__qualname__))

This is probably the least important proposal in the PEP - it just
closes off the last mechanism that is likely to lead to module
duplication after the configuration of sys.path[0] at interpreter
startup is addressed.

Fixing pickling without breaking introspection

To fix this problem, it is proposed to make use of the new module level
__qualname__ attributes to determine the real module location when
__name__ has been modified for any reason.

In the main module, __qualname__ will automatically be set to the main
module's "real" name (as described above) by the interpreter.

Pseudo-modules that adjust __name__ to point to the public namespace
will leave __qualname__ untouched, so the implementation location
remains readily accessible for introspection.

If __name__ is adjusted at the top of a module, then this will
automatically adjust the __module__ attribute for all functions and
classes subsequently defined in that module.

Since multiple submodules may be set to use the same "public" namespace,
functions and classes will be given a new __qualmodule__ attribute that
refers to the __qualname__ of their module.

This isn't strictly necessary for functions (you could find out their
module's qualified name by looking in their globals dictionary), but it
is needed for classes, since they don't hold a reference to the globals
of their defining module. Once a new attribute is added to classes, it
is more convenient to keep the API consistent and add a new attribute to
functions as well.

These changes mean that adjusting __name__ (and, either directly or
indirectly, the corresponding function and class __module__ attributes)
becomes the officially sanctioned way to implement a namespace as a
package, while exposing the API as if it were still a single module.

All serialisation code that currently uses __name__ and __module__
attributes will then avoid exposing implementation details by default.

To correctly handle serialisation of items from the main module, the
class and function definition logic will be updated to also use
__qualname__ for the __module__ attribute in the case where
__name__ == "__main__".

With __name__ and __module__ being officially blessed as being used for
the public names of things, the introspection tools in the standard
library will be updated to use __qualname__ and __qualmodule__ where
appropriate. For example:

-   pydoc will report both public and qualified names for modules
-   inspect.getsource() (and similar tools) will use the qualified names
    that point to the implementation of the code
-   additional pydoc and/or inspect APIs may be provided that report all
    modules with a given public __name__.

Fixing multiprocessing on Windows

With __qualname__ now available to tell multiprocessing the real name of
the main module, it will be able to simply include it in the serialised
information passed to the child process, eliminating the need for the
current dubious introspection of the __file__ attribute.

For older Python versions, multiprocessing could be improved by applying
the split_path_module() algorithm described above when attempting to
work out how to execute the main module based on its __file__ attribute.

Explicit relative imports

This PEP proposes that __package__ be unconditionally defined in the
main module as __qualname__.rpartition('.')[0]. Aside from that, it
proposes that the behaviour of explicit relative imports be left alone.

In particular, if __package__ is not set in a module when an explicit
relative import occurs, the automatically cached value will continue to
be derived from __name__ rather than __qualname__. This minimises any
backwards incompatibilities with existing code that deliberately
manipulates relative imports by adjusting __name__ rather than setting
__package__ directly.

This PEP does not propose that __package__ be deprecated. While it is
technically redundant following the introduction of __qualname__, it
just isn't worth the hassle of deprecating it within the lifetime of
Python 3.x.

Reference Implementation

None as yet.

References

-   Elaboration of compatibility problems between this PEP and PEP 402

Copyright

This document has been placed in the public domain.

[1] Module aliases and/or "real names"

[2] PEP 395 (Module aliasing) and the namespace PEPs

[3] Updated PEP 395 (aka "Implicit Relative Imports Must Die!")

[4] PEP 395 (Module aliasing) and the namespace PEPs

[5] Updated PEP 395 (aka "Implicit Relative Imports Must Die!")