PEP 302 – New Import Hooks
- Just van Rossum <just at letterror.com>, Paul Moore <p.f.moore at gmail.com>
- Standards Track
Table of Contents
- Use cases
- Specification part 1: The Importer Protocol
- Specification part 2: Registering Hooks
- Packages and the role of
- Optional Extensions to the Importer Protocol
- Integration with the ‘imp’ module
- Forward Compatibility
- Open Issues
- References and Footnotes
This PEP proposes to add a new set of import hooks that offer better
customization of the Python import mechanism. Contrary to the current
__import__ hook, a new-style hook can be injected into the existing
scheme, allowing for a finer grained control of how modules are found and how
they are loaded.
The only way to customize the import mechanism is currently to override the
__import__ function. However, overriding
__import__ has many
problems. To begin with:
__import__replacement needs to fully reimplement the entire import mechanism, or call the original
__import__before or after the custom code.
- It has very complex semantics and responsibilities.
__import__gets called even for modules that are already in
sys.modules, which is almost never what you want, unless you’re writing some sort of monitoring tool.
The situation gets worse when you need to extend the import mechanism from C:
it’s currently impossible, apart from hacking Python’s
reimplementing much of
import.c from scratch.
There is a fairly long history of tools written in Python that allow extending
the import mechanism in various way, based on the
__import__ hook. The
Standard Library includes two such tools:
ihooks.py (by GvR) and
imputil.py  (Greg Stein), but perhaps the most famous is
Gordon McMillan, available as part of his Installer package. Their usefulness
is somewhat limited because they are written in Python; bootstrapping issues
need to worked around as you can’t load the module containing the hook with
the hook itself. So if you want the entire Standard Library to be loadable
from an import hook, the hook must be written in C.
This section lists several existing applications that depend on import hooks. Among these, a lot of duplicate work was done that could have been saved if there had been a more flexible import hook at the time. This PEP should make life a lot easier for similar projects in the future.
Extending the import mechanism is needed when you want to load modules that
are stored in a non-standard way. Examples include modules that are bundled
together in an archive; byte code that is not stored in a
file; modules that are loaded from a database over a network.
The work on this PEP was partly triggered by the implementation of PEP 273,
which adds imports from Zip archives as a built-in feature to Python. While
the PEP itself was widely accepted as a must-have feature, the implementation
left a few things to desire. For one thing it went through great lengths to
integrate itself with
import.c, adding lots of code that was either
specific for Zip file imports or not specific to Zip imports, yet was not
generally useful (or even desirable) either. Yet the PEP 273 implementation
can hardly be blamed for this: it is simply extremely hard to do, given the
current state of
Packaging applications for end users is a typical use case for import hooks,
if not the typical use case. Distributing lots of source or
around is not always appropriate (let alone a separate Python installation),
so there is a frequent desire to package all needed modules in a single file.
So frequent in fact that multiple solutions have been implemented over the
The oldest one is included with the Python source code: Freeze . It puts
marshalled byte code into static objects in C source code. Freeze’s “import
hook” is hard wired into
import.c, and has a couple of issues. Later
solutions include Fredrik Lundh’s Squeeze, Gordon McMillan’s Installer, and
Thomas Heller’s py2exe . MacPython ships with a tool called
Squeeze, Installer and py2exe use an
__import__ based scheme (py2exe
currently uses Installer’s
iu.py, Squeeze used
has two Mac-specific import hooks hard wired into
import.c, that are
similar to the Freeze hook. The hooks proposed in this PEP enables us (at
least in theory; it’s not a short-term goal) to get rid of the hard coded
import.c, and would allow the
__import__-based tools to get
rid of most of their
import.c emulation code.
Before work on the design and implementation of this PEP was started, a new
BuildApplication-like tool for Mac OS X prompted one of the authors of
this PEP (JvR) to expose the table of frozen modules to Python, in the
module. The main reason was to be able to use the freeze import hook
__import__ support), yet to also be able to supply a set
of modules at runtime. This resulted in issue #642578 , which was
mysteriously accepted (mostly because nobody seemed to care either way ;-).
Yet it is completely superfluous when this PEP gets accepted, as it offers a
much nicer and general way to do the same thing.
While experimenting with alternative implementation ideas to get built-in Zip
import, it was discovered that achieving this is possible with only a fairly
small amount of changes to
import.c. This allowed to factor out the
Zip-specific stuff into a new source file, while at the same time creating a
general new import hook scheme: the one you’re reading about now.
An earlier design allowed non-string objects on
sys.path. Such an object
would have the necessary methods to handle an import. This has two
disadvantages: 1) it breaks code that assumes all items on
strings; 2) it is not compatible with the
PYTHONPATH environment variable.
The latter is directly needed for Zip imports. A compromise came from Jython:
allow string subclasses on
sys.path, which would then act as importer
objects. This avoids some breakage, and seems to work well for Jython (where
it is used to load modules from
.jar files), but it was perceived as an
This led to a more elaborate scheme, (mostly copied from McMillan’s
iu.py) in which each in a list of candidates is asked whether it can
sys.path item, until one is found that can. This list of
candidates is a new object in the
sys.path_hooks for each path item for each new import can be
expensive, so the results are cached in another new object in the
sys.path_importer_cache. It maps
sys.path entries to importer
To minimize the impact on
import.c as well as to avoid adding extra
overhead, it was chosen to not add an explicit hook and importer object for
the existing file system import logic (as
iu.py has), but to simply fall
back to the built-in logic if no hook on
sys.path_hooks could handle the
path item. If this is the case, a
None value is stored in
sys.path_importer_cache, again to avoid repeated lookups. (Later we can
go further and add a real importer object for the built-in mechanism, for now,
None fallback scheme should suffice.)
A question was raised: what about importers that don’t need any entry on
sys.path? (Built-in and frozen modules fall into that category.) Again,
Gordon McMillan to the rescue:
iu.py contains a thing he calls the
metapath. In this PEP’s implementation, it’s a list of importer objects
that is traversed before
sys.path. This list is yet another new object
sys.meta_path. Currently, this list is empty by
default, and frozen and built-in module imports are done after traversing
sys.meta_path, but still before
Specification part 1: The Importer Protocol
This PEP introduces a new protocol: the “Importer Protocol”. It is important to understand the context in which the protocol operates, so here is a brief overview of the outer shells of the import mechanism.
When an import statement is encountered, the interpreter looks up the
__import__ function in the built-in name space.
__import__ is then
called with four arguments, amongst which are the name of the module being
imported (may be a dotted name) and a reference to the current global
__import__ function (known as
import.c) will then check to see whether the module doing the import is
a package or a submodule of a package. If it is indeed a (submodule of a)
package, it first tries to do the import relative to the package (the parent
package for a submodule). For example, if a package named “spam” does “import
eggs”, it will first look for a module named “spam.eggs”. If that fails, the
import continues as an absolute import: it will look for a module named
“eggs”. Dotted name imports work pretty much the same: if package “spam” does
“import eggs.bacon” (and “spam.eggs” exists and is itself a package),
“spam.eggs.bacon” is tried. If that fails “eggs.bacon” is tried. (There are
more subtleties that are not described here, but these are not relevant for
implementers of the Importer Protocol.)
Deeper down in the mechanism, a dotted name import is split up by its components. For “import spam.ham”, first an “import spam” is done, and only when that succeeds is “ham” imported as a submodule of “spam”.
The Importer Protocol operates at this level of individual imports. By the time an importer gets a request for “spam.ham”, module “spam” has already been imported.
The protocol involves two objects: a finder and a loader. A finder object has a single method:
This method will be called with the fully qualified name of the module. If
the finder is installed on
sys.meta_path, it will receive a second
argument, which is
None for a top-level module, or
for submodules or subpackages . It should return a loader object if the
module was found, or
None if it wasn’t. If
find_module() raises an
exception, it will be propagated to the caller, aborting the import.
A loader object also has one method:
This method returns the loaded module or raises an exception, preferably
ImportError if an existing exception is not being propagated. If
load_module() is asked to load a module that it cannot,
to be raised.
In many cases the finder and loader can be one and the same object:
finder.find_module() would just return
fullname argument of both methods is the fully qualified module name,
for example “spam.eggs.ham”. As explained above, when
finder.find_module("spam.eggs.ham") is called, “spam.eggs” has already
been imported and added to
sys.modules. However, the
method isn’t necessarily always called during an actual import: meta tools
that analyze import dependencies (such as freeze, Installer or py2exe) don’t
actually load modules, so a finder shouldn’t depend on the parent package
being available in
load_module() method has a few responsibilities that it must fulfill
before it runs any code:
- If there is an existing module object named ‘fullname’ in
sys.modules, the loader must use that existing module. (Otherwise, the
reload()builtin will not work correctly.) If a module named ‘fullname’ does not exist in
sys.modules, the loader must create a new module object and add it to
Note that the module object must be in
sys.modulesbefore the loader executes the module code. This is crucial because the module code may (directly or indirectly) import itself; adding it to
sys.modulesbeforehand prevents unbounded recursion in the worst case and multiple loading in the best.
If the load fails, the loader needs to remove any module it may have inserted into
sys.modules. If the module was already in
sys.modulesthen the loader should leave it alone.
__file__attribute must be set. This must be a string, but it may be a dummy value, for example “<frozen>”. The privilege of not having a
__file__attribute at all is reserved for built-in modules.
__name__attribute must be set. If one uses
imp.new_module()then the attribute is set automatically.
- If it’s a package, the
__path__variable must be set. This must be a list, but may be empty if
__path__has no further significance to the importer (more on this later).
__loader__attribute must be set to the loader object. This is mostly for introspection and reloading, but can be used for importer-specific extras, for example getting data associated with an importer.
__package__attribute must be set (PEP 366).
If the module is a Python module (as opposed to a built-in module or a dynamically loaded extension), it should execute the module’s code in the module’s global name space (
Here is a minimal pattern for a
# Consider using importlib.util.module_for_loader() to handle # most of these details for you. def load_module(self, fullname): code = self.get_code(fullname) ispkg = self.is_package(fullname) mod = sys.modules.setdefault(fullname, imp.new_module(fullname)) mod.__file__ = "<%s>" % self.__class__.__name__ mod.__loader__ = self if ispkg: mod.__path__ =  mod.__package__ = fullname else: mod.__package__ = fullname.rpartition('.') exec(code, mod.__dict__) return mod
Specification part 2: Registering Hooks
There are two types of import hooks: Meta hooks and Path hooks. Meta
hooks are called at the start of import processing, before any other import
processing (so that meta hooks can override
sys.path processing, frozen
modules, or even built-in modules). To register a meta hook, simply add the
finder object to
sys.meta_path (the list of registered meta hooks).
Path hooks are called as part of
processing, at the point where their associated path item is encountered. A
path hook is registered by adding an importer factory to
sys.path_hooks is a list of callables, which will be checked in sequence
to determine if they can handle a given path item. The callable is called
with one argument, the path item. The callable must raise
it is unable to handle the path item, and return an importer object if it can
handle the path item. Note that if the callable returns an importer object
for a specific
sys.path entry, the builtin import machinery will not be
invoked to handle that entry any longer, even if the importer object later
fails to find a specific module. The callable is typically the class of the
import hook, and hence the class
__init__() method is called. (This is
also the reason why it should raise
can’t return anything. This would be possible with a
__new__() method in
a new style class, but we don’t want to require anything about how a hook is
The results of path hook checks are cached in
which is a dictionary mapping path entries to importer objects. The cache is
sys.path_hooks is scanned. If it is necessary to force a
sys.path_hooks, it is possible to manually clear all or part of
sys.path itself, the new
sys variables must have specific
sys.path_hooksmust be Python lists.
sys.path_importer_cachemust be a Python dict.
Modifying these variables in place is allowed, as is replacing them with new objects.
Packages and the role of
If a module has a
__path__ attribute, the import mechanism will treat it
as a package. The
__path__ variable is used instead of
importing submodules of the package. The rules for
also apply to
sys.path_hooks is also consulted when
pkg.__path__ is traversed. Meta importers don’t necessarily use
sys.path at all to do their work and may therefore ignore the value of
pkg.__path__. In this case it is still advised to set it to list, which
can be empty.
Optional Extensions to the Importer Protocol
The Importer Protocol defines three optional extensions. One is to retrieve data files, the second is to support module packaging tools and/or tools that analyze module dependencies (for example Freeze), while the last is to support execution of modules as scripts. The latter two categories of tools usually don’t actually load modules, they only need to know if and where they are available. All three extensions are highly recommended for general purpose importers, but may safely be left out if those features aren’t needed.
To retrieve the data for arbitrary “files” from the underlying storage
backend, loader objects may supply a method named
This method returns the data as a string, or raise
IOError if the “file”
wasn’t found. The data is always returned as if “binary” mode was used -
there is no CRLF translation of text files, for example. It is meant for
importers that have some file-system-like properties. The ‘path’ argument is
a path that can be constructed by munging
pkg.__path__ items) with the
os.path.* functions, for example:
d = os.path.dirname(__file__) data = __loader__.get_data(os.path.join(d, "logo.gif"))
The following set of methods may be implemented if support for (for example) Freeze-like tools is desirable. It consists of three additional methods which, to make it easier for the caller, each of which should be implemented, or none at all:
loader.is_package(fullname) loader.get_code(fullname) loader.get_source(fullname)
All three methods should raise
ImportError if the module wasn’t found.
loader.is_package(fullname) method should return
True if the
module specified by ‘fullname’ is a package and
False if it isn’t.
loader.get_code(fullname) method should return the code object
associated with the module, or
None if it’s a built-in or extension
module. If the loader doesn’t have the code object but it does have the
source code, it should return the compiled source code. (This is so that our
caller doesn’t also need to check
get_source() if all it needs is the code
loader.get_source(fullname) method should return the source code for
the module as a string (using newline characters for line endings) or
if the source is not available (yet it should still raise
the module can’t be found by the importer at all).
To support execution of modules as scripts (PEP 338),
the above three methods for
finding the code associated with a module must be implemented. In addition to
those methods, the following method may be provided in order to allow the
runpy module to correctly set the
This method should return the value that
__file__ would be set to if the
named module was loaded. If the module is not found, then
should be raised.
Integration with the ‘imp’ module
The new import hooks are not easily integrated in the existing
imp.load_module() calls. It’s questionable
whether it’s possible at all without breaking code; it is better to simply add
a new function to the
imp module. The meaning of the existing
imp.load_module() calls changes from: “they
expose the built-in import mechanism” to “they expose the basic unhooked
built-in import mechanism”. They simply won’t invoke any import hooks. A new
imp module function is proposed (but not yet implemented) under the name
get_loader(), which is used as in the following pattern:
loader = imp.get_loader(fullname, path) if loader is not None: loader.load_module(fullname)
In the case of a “basic” import, one the imp.find_module() function would
handle, the loader object would be a wrapper for the current output of
loader.load_module() would call
imp.load_module() with that output.
Note that this wrapper is currently not yet implemented, although a Python
prototype exists in the
test_importhooks.py script (the
class) included with the patch.
__import__ hooks will not invoke new-style hooks by magic, unless
they call the original
__import__ function as a fallback. For example,
imputil.py are in this sense not forward
compatible with this PEP.
Modules often need supporting data files to do their job, particularly in the
case of complex packages or full applications. Current practice is generally
to locate such files via
sys.path (or a
This approach will not work, in general, for modules loaded via an import
There are a number of possible ways to address this problem:
- “Don’t do that”. If a package needs to locate data files via its
__path__, it is not suitable for loading via an import hook. The package can still be located on a directory in
sys.path, as at present, so this should not be seen as a major issue.
- Locate data files from a standard location, rather than relative to the
module file. A relatively simple approach (which is supported by
distutils) would be to locate data files based on
sys.exec_prefix). For example, looking in
os.path.join(sys.prefix, "data", package_name).
- Import hooks could offer a standard way of getting at data files relative
to the module file. The standard
zipimportobject provides a method
get_data(name)which returns the content of the “file” called
name, as a string. To allow modules to get at the importer object,
zipimportalso adds an attribute
__loader__to the module, containing the
zipimportobject used to load the module. If such an approach is used, it is important that client code takes care not to break if the
get_data()method is not available, so it is not clear that this approach offers a general answer to the problem.
It was suggested on python-dev that it would be useful to be able to receive a
list of available modules from an importer and/or a list of available data
files for use with the
get_data() method. The protocol could grow two
additional extensions, say
latter makes sense on loader objects with a
get_data() method. However,
it’s a bit unclear which object should implement
importer or the loader or both?
This PEP is biased towards loading modules from alternative places: it
currently doesn’t offer dedicated solutions for loading modules from
alternative file formats or with alternative compilers. In contrast, the
ihooks module from the standard library does have a fairly straightforward
way to do this. The Quixote project  uses this technique to import PTL
files as if they are ordinary Python modules. To do the same with the new
hooks would either mean to add a new module implementing a subset of
ihooks as a new-style importer, or add a hookable built-in path importer
There is no specific support within this PEP for “stacking” hooks. For
example, it is not obvious how to write a hook to load modules from
files by combining separate hooks to load modules from
files. However, there is no support for such stacking in the existing hook
mechanisms (either the basic “replace
__import__” method, or any of the
existing import hook modules) and so this functionality is not an obvious
requirement of the new mechanism. It may be worth considering as a future
It is possible (via
sys.meta_path) to add hooks which run before
sys.path is processed. However, there is no equivalent way of adding
hooks to run after
sys.path is processed. For now, if a hook is required
sys.path has been processed, it can be simulated by adding an
arbitrary “cookie” string at the end of
sys.path, and having the required
hook associated with this cookie, via the normal
processing. In the longer term, the path handling code will become a “real”
sys.meta_path, and at that stage it will be possible to insert
user-defined hooks either before or after it.
The PEP 302 implementation has been integrated with Python as of 2.3a1. An earlier version is available as patch #652586 , but more interestingly, the issue contains a fairly detailed history of the development and design.
PEP 273 has been implemented using PEP 302’s import hooks.
References and Footnotes
This document has been placed in the public domain.
Last modified: 2022-01-21 11:03:51 GMT