PEP 3122 – Delineation of the main module
- Author:
- Brett Cannon
- Status:
- Rejected
- Type:
- Standards Track
- Created:
- 27-Apr-2007
- Post-History:
Attention
This PEP has been rejected. Guido views running scripts within a package as an anti-pattern [3].
Abstract
Because of how name resolution works for relative imports in a world
where PEP 328 is implemented, the ability to execute modules within a
package ceases being possible. This failing stems from the fact that
the module being executed as the “main” module replaces its
__name__
attribute with "__main__"
instead of leaving it as
the absolute name of the module. This breaks import’s ability
to resolve relative imports from the main module into absolute names.
In order to resolve this issue, this PEP proposes to change how the
main module is delineated. By leaving the __name__
attribute in
a module alone and setting sys.main
to the name of the main
module this will allow at least some instances of executing a module
within a package that uses relative imports.
This PEP does not address the idea of introducing a module-level function that is automatically executed like PEP 299 proposes.
The Problem
With the introduction of PEP 328, relative imports became dependent on
the __name__
attribute of the module performing the import. This
is because the use of dots in a relative import are used to strip away
parts of the calling module’s name to calculate where in the package
hierarchy an import should fall (prior to PEP 328 relative
imports could fail and would fall back on absolute imports which had a
chance of succeeding).
For instance, consider the import from .. import spam
made from the
bacon.ham.beans
module (bacon.ham.beans
is not a package
itself, i.e., does not define __path__
). Name resolution of the
relative import takes the caller’s name (bacon.ham.beans
), splits
on dots, and then slices off the last n parts based on the level
(which is 2). In this example both ham
and beans
are dropped
and spam
is joined with what is left (bacon
). This leads to
the proper import of the module bacon.spam
.
This reliance on the __name__
attribute of a module when handling
relative imports becomes an issue when executing a script within a
package. Because the executing script has its name set to
'__main__'
, import cannot resolve any relative imports, leading to
an ImportError
.
For example, assume we have a package named bacon
with an
__init__.py
file containing:
from . import spam
Also create a module named spam
within the bacon
package (it
can be an empty file). Now if you try to execute the bacon
package (either through python bacon/__init__.py
or
python -m bacon
) you will get an ImportError
about trying to
do a relative import from within a non-package. Obviously the import
is valid, but because of the setting of __name__
to '__main__'
import thinks that bacon/__init__.py
is not in a package since no
dots exist in __name__
. To see how the algorithm works in more
detail, see importlib.Import._resolve_name()
in the sandbox
[2].
Currently a work-around is to remove all relative imports in the module being executed and make them absolute. This is unfortunate, though, as one should not be required to use a specific type of resource in order to make a module in a package be able to be executed.
The Solution
The solution to the problem is to not change the value of __name__
in modules. But there still needs to be a way to let executing code
know it is being executed as a script. This is handled with a new
attribute in the sys
module named main
.
When a module is being executed as a script, sys.main
will be set
to the name of the module. This changes the current idiom of:
if __name__ == '__main__':
...
to:
import sys
if __name__ == sys.main:
...
The newly proposed solution does introduce an added line of boilerplate which is a module import. But as the solution does not introduce a new built-in or module attribute (as discussed in Rejected Ideas) it has been deemed worth the extra line.
Another issue with the proposed solution (which also applies to all
rejected ideas as well) is that it does not directly solve the problem
of discovering the name of a file. Consider python bacon/spam.py
.
By the file name alone it is not obvious whether bacon
is a
package. In order to properly find this out both the current
direction must exist on sys.path
as well as bacon/__init__.py
existing.
But this is the simple example. Consider python ../spam.py
. From
the file name alone it is not at all clear if spam.py
is in a
package or not. One possible solution is to find out what the
absolute name of ..
, check if a file named __init__.py
exists,
and then look if the directory is on sys.path
. If it is not, then
continue to walk up the directory until no more __init__.py
files
are found or the directory is found on sys.path
.
This could potentially be an expensive process. If the package depth
happens to be deep then it could require a large amount of disk access
to discover where the package is anchored on sys.path
, if at all.
The stat calls alone can be expensive if the file system the executed
script is on is something like NFS.
Because of these issues, only when the -m
command-line argument
(introduced by PEP 338) is used will __name__
be set. Otherwise
the fallback semantics of setting __name__
to "__main__"
will
occur. sys.main
will still be set to the proper value,
regardless of what __name__
is set to.
Implementation
When the -m
option is used, sys.main
will be set to the
argument passed in. sys.argv
will be adjusted as it is currently.
Then the equivalent of __import__(self.main)
will occur. This
differs from current semantics as the runpy
module fetches the
code object for the file specified by the module name in order to
explicitly set __name__
and other attributes. This is no longer
needed as import can perform its normal operation in this situation.
If a file name is specified, then sys.main
will be set to
"__main__"
. The specified file will then be read and have a code
object created and then be executed with __name__
set to
"__main__"
. This mirrors current semantics.
Transition Plan
In order for Python 2.6 to be able to support both the current
semantics and the proposed semantics, sys.main
will always be set
to "__main__"
. Otherwise no change will occur for Python 2.6.
This unfortunately means that no benefit from this change will occur
in Python 2.6, but it maximizes compatibility for code that is to
work as much as possible with 2.6 and 3.0.
To help transition to the new idiom, 2to3 [1] will gain a rule to
transform the current if __name__ == '__main__': ...
idiom to the
new one. This will not help with code that checks __name__
outside of the idiom, though.
Rejected Ideas
__main__
built-in
A counter-proposal to introduce a built-in named __main__
.
The value of the built-in would be the name of the module being
executed (just like the proposed sys.main
). This would lead to a
new idiom of:
if __name__ == __main__:
...
A drawback is that the syntactic difference is subtle; the dropping of quotes around “__main__”. Some believe that for existing Python programmers bugs will be introduced where the quotation marks will be put on by accident. But one could argue that the bug would be discovered quickly through testing as it is a very shallow bug.
While the name of built-in could obviously be different (e.g.,
main
) the other drawback is that it introduces a new built-in.
With a simple solution such as sys.main
being possible without
adding another built-in to Python, this proposal was rejected.
__main__
module attribute
Another proposal was to add a __main__
attribute to every module.
For the one that was executing as the main module, the attribute would
have a true value while all other modules had a false value. This has
a nice consequence of simplify the main module idiom to:
if __main__:
...
The drawback was the introduction of a new module attribute. It also required more integration with the import machinery than the proposed solution.
Use __file__
instead of __name__
Any of the proposals could be changed to use the __file__
attribute on modules instead of __name__
, including the current
semantics. The problem with this is that with the proposed solutions
there is the issue of modules having no __file__
attribute defined
or having the same value as other modules.
The problem that comes up with the current semantics is you still have to try to resolve the file path to a module name for the import to work.
Special string subclass for __name__
that overrides __eq__
One proposal was to define a subclass of str
that overrode the
__eq__
method so that it would compare equal to "__main__"
as
well as the actual name of the module. In all other respects the
subclass would be the same as str
.
This was rejected as it seemed like too much of a hack.
References
Copyright
This document has been placed in the public domain.
Source: https://github.com/python/peps/blob/main/peps/pep-3122.rst
Last modified: 2023-09-09 17:39:29 GMT