PEP: 3122 Title: Delineation of the main module Version: $Revision$
Last-Modified: $Date$ Author: Brett Cannon Status: Rejected Type:
Standards Track Content-Type: text/x-rst Created: 27-Apr-2007
Post-History:

Attention

This PEP has been rejected. Guido views running scripts within a package
as an anti-pattern[1].

Abstract

Because of how name resolution works for relative imports in a world
where PEP 328 is implemented, the ability to execute modules within a
package ceases being possible. This failing stems from the fact that the
module being executed as the "main" module replaces its __name__
attribute with "__main__" instead of leaving it as the absolute name of
the module. This breaks import's ability to resolve relative imports
from the main module into absolute names.

In order to resolve this issue, this PEP proposes to change how the main
module is delineated. By leaving the __name__ attribute in a module
alone and setting sys.main to the name of the main module this will
allow at least some instances of executing a module within a package
that uses relative imports.

This PEP does not address the idea of introducing a module-level
function that is automatically executed like PEP 299 proposes.

The Problem

With the introduction of PEP 328, relative imports became dependent on
the __name__ attribute of the module performing the import. This is
because the use of dots in a relative import are used to strip away
parts of the calling module's name to calculate where in the package
hierarchy an import should fall (prior to PEP 328 relative imports could
fail and would fall back on absolute imports which had a chance of
succeeding).

For instance, consider the import from .. import spam made from the
bacon.ham.beans module (bacon.ham.beans is not a package itself, i.e.,
does not define __path__). Name resolution of the relative import takes
the caller's name (bacon.ham.beans), splits on dots, and then slices off
the last n parts based on the level (which is 2). In this example both
ham and beans are dropped and spam is joined with what is left (bacon).
This leads to the proper import of the module bacon.spam.

This reliance on the __name__ attribute of a module when handling
relative imports becomes an issue when executing a script within a
package. Because the executing script has its name set to '__main__',
import cannot resolve any relative imports, leading to an ImportError.

For example, assume we have a package named bacon with an __init__.py
file containing:

    from . import spam

Also create a module named spam within the bacon package (it can be an
empty file). Now if you try to execute the bacon package (either through
python bacon/__init__.py or python -m bacon) you will get an ImportError
about trying to do a relative import from within a non-package.
Obviously the import is valid, but because of the setting of __name__ to
'__main__' import thinks that bacon/__init__.py is not in a package
since no dots exist in __name__. To see how the algorithm works in more
detail, see importlib.Import._resolve_name() in the sandbox [2].

Currently a work-around is to remove all relative imports in the module
being executed and make them absolute. This is unfortunate, though, as
one should not be required to use a specific type of resource in order
to make a module in a package be able to be executed.

The Solution

The solution to the problem is to not change the value of __name__ in
modules. But there still needs to be a way to let executing code know it
is being executed as a script. This is handled with a new attribute in
the sys module named main.

When a module is being executed as a script, sys.main will be set to the
name of the module. This changes the current idiom of:

    if __name__ == '__main__':
        ...

to:

    import sys
    if __name__ == sys.main:
        ...

The newly proposed solution does introduce an added line of boilerplate
which is a module import. But as the solution does not introduce a new
built-in or module attribute (as discussed in Rejected Ideas) it has
been deemed worth the extra line.

Another issue with the proposed solution (which also applies to all
rejected ideas as well) is that it does not directly solve the problem
of discovering the name of a file. Consider python bacon/spam.py. By the
file name alone it is not obvious whether bacon is a package. In order
to properly find this out both the current direction must exist on
sys.path as well as bacon/__init__.py existing.

But this is the simple example. Consider python ../spam.py. From the
file name alone it is not at all clear if spam.py is in a package or
not. One possible solution is to find out what the absolute name of ..,
check if a file named __init__.py exists, and then look if the directory
is on sys.path. If it is not, then continue to walk up the directory
until no more __init__.py files are found or the directory is found on
sys.path.

This could potentially be an expensive process. If the package depth
happens to be deep then it could require a large amount of disk access
to discover where the package is anchored on sys.path, if at all. The
stat calls alone can be expensive if the file system the executed script
is on is something like NFS.

Because of these issues, only when the -m command-line argument
(introduced by PEP 338) is used will __name__ be set. Otherwise the
fallback semantics of setting __name__ to "__main__" will occur.
sys.main will still be set to the proper value, regardless of what
__name__ is set to.

Implementation

When the -m option is used, sys.main will be set to the argument passed
in. sys.argv will be adjusted as it is currently. Then the equivalent of
__import__(self.main) will occur. This differs from current semantics as
the runpy module fetches the code object for the file specified by the
module name in order to explicitly set __name__ and other attributes.
This is no longer needed as import can perform its normal operation in
this situation.

If a file name is specified, then sys.main will be set to "__main__".
The specified file will then be read and have a code object created and
then be executed with __name__ set to "__main__". This mirrors current
semantics.

Transition Plan

In order for Python 2.6 to be able to support both the current semantics
and the proposed semantics, sys.main will always be set to "__main__".
Otherwise no change will occur for Python 2.6. This unfortunately means
that no benefit from this change will occur in Python 2.6, but it
maximizes compatibility for code that is to work as much as possible
with 2.6 and 3.0.

To help transition to the new idiom, 2to3[3] will gain a rule to
transform the current if __name__ == '__main__': ... idiom to the new
one. This will not help with code that checks __name__ outside of the
idiom, though.

Rejected Ideas

__main__ built-in

A counter-proposal to introduce a built-in named __main__. The value of
the built-in would be the name of the module being executed (just like
the proposed sys.main). This would lead to a new idiom of:

    if __name__ == __main__:
        ...

A drawback is that the syntactic difference is subtle; the dropping of
quotes around "__main". Some believe that for existing Python
programmers bugs will be introduced where the quotation marks will be
put on by accident. But one could argue that the bug would be discovered
quickly through testing as it is a very shallow bug.

While the name of built-in could obviously be different (e.g., main) the
other drawback is that it introduces a new built-in. With a simple
solution such as sys.main being possible without adding another built-in
to Python, this proposal was rejected.

__main__ module attribute

Another proposal was to add a __main__ attribute to every module. For
the one that was executing as the main module, the attribute would have
a true value while all other modules had a false value. This has a nice
consequence of simplify the main module idiom to:

    if __main__:
        ...

The drawback was the introduction of a new module attribute. It also
required more integration with the import machinery than the proposed
solution.

Use __file__ instead of __name__

Any of the proposals could be changed to use the __file__ attribute on
modules instead of __name__, including the current semantics. The
problem with this is that with the proposed solutions there is the issue
of modules having no __file__ attribute defined or having the same value
as other modules.

The problem that comes up with the current semantics is you still have
to try to resolve the file path to a module name for the import to work.

Special string subclass for __name__ that overrides __eq__

One proposal was to define a subclass of str that overrode the __eq__
method so that it would compare equal to "__main__" as well as the
actual name of the module. In all other respects the subclass would be
the same as str.

This was rejected as it seemed like too much of a hack.

References

Copyright

This document has been placed in the public domain.

[1] Python-Dev email: "PEP to change how the main module is delineated"
(https://mail.python.org/pipermail/python-3000/2007-April/006793.html)

[2] importlib
(http://svn.python.org/view/sandbox/trunk/import_in_py/importlib.py?view=markup)
[ViewVC]

[3] 2to3 tool (http://svn.python.org/view/sandbox/trunk/2to3/) [ViewVC]