PEP 690 – Lazy Imports
- Germán Méndez Bravo <german.mb at gmail.com>, Carl Meyer <carl at oddbird.net>
- Barry Warsaw <barry at python.org>
- Discourse thread
- Standards Track
- 03-May-2022, 03-May-2022
Table of Contents
- Backwards Compatibility
- Security Implications
- Performance Impact
- How to Teach This
- Reference Implementation
- Rejected Ideas
This PEP proposes a feature to transparently defer the execution of imported modules until the moment when an imported object is used. Since Python programs commonly import many more modules than a single invocation of the program is likely to use in practice, lazy imports can greatly reduce the overall number of modules loaded, improving startup time and memory usage. Lazy imports also mostly eliminate the risk of import cycles.
Common Python code style prefers imports at module level, so they don’t have to be repeated within each scope the imported object is used in, and to avoid the inefficiency of repeated execution of the import system at runtime. This means that importing the main module of a program typically results in an immediate cascade of imports of most or all of the modules that may ever be needed by the program.
Consider the example of a Python command line program with a number of
subcommands. Each subcommand may perform different tasks, requiring the import
of different dependencies. But a given invocation of the program will only
execute a single subcommand, or possibly none (i.e. if just
info is requested). Top-level eager imports in such a program will result in
the import of many modules that will never be used at all; the time spent
(possibly compiling and) executing these modules is pure waste.
In an effort to improve startup time, some large Python CLIs tools make imports lazy by manually placing imports inline into functions to delay imports of expensive subsystems. This manual approach is labor-intensive and fragile; one misplaced import or refactor can easily undo painstaking optimization work.
Existing import-hook-based solutions such as demandimport or importlib.util.LazyLoader
are limited in that only certain styles of import can be made truly lazy
(imports such as
from foo import a, b will still eagerly import the module
foo) and they impose additional runtime overhead on every module attribute
This PEP proposes a more comprehensive solution for lazy imports that does not impose detectable overhead in real-world use. The implementation in this PEP has already demonstrated startup time improvements up to 70% and memory-use reductions up to 40% on real-world Python CLIs.
Lazy imports also eliminate most import cycles. With eager imports, “false
cycles” can easily occur which are fixed by simply moving an import to the
bottom of a module or inline into a function, or switching from
import bar to
import foo. With lazy imports, these “cycles” just work.
The only cycles which will remain are those where two modules actually each use
a name from the other at module level; these “true” cycles are only fixable by
refactoring the classes or functions involved.
The aim of this feature is to make imports transparently lazy. “Lazy” means
that the import of a module (execution of the module body and addition of the
module object to
sys.modules) should not occur until the module (or a name
imported from it) is actually referenced during execution. “Transparent” means
that besides the delayed import (and necessarily observable effects of that,
such as delayed import side effects and changes to
sys.modules), there is
no other observable change in behavior: the imported object is present in the
module namespace as normal and is transparently loaded whenever first used: its
status as a “lazy imported object” is not directly observable from Python or
from C extension code.
The requirement that the imported object be present in the module namespace as usual, even before the import has actually occurred, means that we need some kind of “lazy object” placeholder to represent the not-yet-imported object. The transparency requirement dictates that this placeholder must never be visible to Python code; any reference to it must trigger the import and replace it with the real imported object.
Given the possibility that Python (or C extension) code may pull objects
directly out of a module
__dict__, the only way to reliably prevent
accidental leakage of lazy objects is to have the dictionary itself be
responsible to ensure resolution of lazy objects on lookup.
To avoid a performance penalty on the vast majority of dictionaries which never
contain any lazy objects, we install a specialized lookup function
lookdict_unicode_lazy) for module namespace dictionaries when they first
gain a lazy-object value. When this lookup function finds that the key
references a lazy object, it resolves the lazy object immediately before
Some operations on dictionaries (e.g. iterating all values) don’t go through
the lookup function; in these cases we have to add a check if the lookup
lookdict_unicode_lazy and if so, resolve all lazy values first.
This implementation comprehensively prevents leakage of lazy objects, ensuring they are always resolved to the real imported object before anyone can get hold of them for any use, while avoiding any significant performance impact on dictionaries in general.
Lazy imports are opt-in, and globally enabled via a new
-L flag to the
Python interpreter, or a
PYTHONLAZYIMPORTS environment variable.
When enabled, the loading and execution of all (and only) top level imports is deferred until the imported name is used. This could happen immediately (e.g. on the very next line after the import statement) or much later (e.g. while using the name inside a function being called by some other code at some later time.)
For these top level imports, there are two exceptions which will make them
eager (not lazy): imports inside
blocks, and star imports (
from foo import *.) Imports inside
exception-handling blocks (this includes
with blocks, since those can also
“catch” and handle exceptions) remain eager so that any exceptions arising from
the import can be handled. Star imports must remain eager since performing the
import is the only way to know which names should be added to the namespace.
Imports inside class definitions or inside functions/methods are not “top level” and are never lazy.
Dynamic imports using
also never lazy.
Say we have a module
# simulate some work import time time.sleep(10) print("spam loaded")
And a module
eggs.py which imports it:
import spam print("imports done")
If we run
python -L eggs.py, the
spam module will never be imported
(because it is never referenced after the import),
"spam loaded" will never
be printed, and there will be no 10 second delay.
eggs.py simply references the name
spam after importing it, that
will be enough to trigger the import of
import spam print("imports done") spam
Now if we run
python -L eggs.py, we will see the output
printed first, then a 10 second delay, and then
"spam loaded" printed after
Of course, in real use cases (especially with lazy imports), it’s not recommended to rely on import side effects like this to trigger real work. This example is just to clarify the behavior of lazy imports.
The implementation will ensure that exceptions resulting from a deferred import have metadata attached pointing the user to the original import statement, to ease debuggability of errors from lazy imports.
Additionally, debug logging from
python -v will include logging when an
import statement has been encountered but execution of the import will be
-X importtime feature for profiling import costs adapts naturally
to lazy imports; the profiled time is the time spent actually importing.
Per-module opt out
Due to the backwards compatibility issues mentioned below, it may be necessary to force some imports to be eager.
In first-party code, since imports inside a
with block are never
lazy, this can be easily accomplished:
try: # force these imports to be eager import foo import bar finally: pass
This PEP proposes to add a new
importlib.eager_imports() context manager,
so the above technique can be less verbose and doesn’t require comments to
clarify its intent:
with eager_imports(): import foo import bar
Since imports within context managers are always eager, the
context manager can just be an alias to a null context manager. The context
manager does not force all imports to be recursively eager:
will be imported eagerly, but imports within those modules will still follow
the usual laziness rules.
The more difficult case can occur if an import in third-party code that can’t
easily be modified must be forced to be eager. For this purpose, we propose to
add an API to
importlib that can be called early in the process to specify
a list of module names within which all imports will be eager:
from importlib import set_eager_imports set_eager_imports(["one.mod", "another"])
The effect of this is also shallow: all imports within
one.mod will be
eager, but not imports in all modules imported by
set_eager_imports() can also take a callback which receives a module name and returns
whether imports within this module should be eager:
import re from importlib import set_eager_imports def eager_imports(name): return re.match(r"foo\.[^.]+\.logger", name) set_eager_imports(eager_imports)
This proposal preserves full backwards compatibility when the feature is disabled, which is the default.
Even when enabled, most code will continue to work normally without any observable change (other than improved startup time and memory usage.) Namespace packages are not affected: they work just as they do currently, except lazily.
In some existing code, lazy imports could produce currently unexpected results and behaviors. The problems that we may see when enabling lazy imports in an existing codebase are related to:
Import Side Effects
Import side effects that would otherwise be produced by the execution of imported modules during the execution of import statements will be deferred at least until the imported objects are used.
These import side effects may include:
- code executing any side-effecting logic during import;
- relying on imported submodules being set as attributes in the parent module.
A relevant and typical affected case is the click library for building Python command-line
interfaces. If e.g.
cli = click.group() is defined in
main and adds subcommands to it via
@cli.command(...)), but the actual
cli() call is in
main.py, then lazy imports may prevent the subcommands from being
registered, since in this case Click is depending on side effects of the import
sub.py. In this case the fix is to ensure the import of
eager, e.g. by using the
importlib.eager_imports() context manager.
There could be issues related to dynamic Python import paths; particularly,
adding (and then removing after the import) paths from
sys.path.insert(0, "/path/to/foo/module") import foo del sys.path foo.Bar()
In this case, with lazy imports enabled, the import of
foo will not
actually occur while the addition to
sys.path is present.
All exceptions arising from import (including
deferred from import time to first-use time, which could complicate debugging.
Accessing an object in the middle of any code could trigger a deferred import
ImportError or any other exception resulting from the
resolution of the deferred object, while loading and executing the related
imported module. The implementation will provide debugging assistance in
lazy-import-triggered tracebacks to mitigate this issue.
Deferred execution of code could produce security concerns if process owner,
sys.path, or other sensitive environment or contextual states change
between the time the
import statement is executed and the time where the
imported object is used.
The reference implementation has shown that the feature has negligible performance impact on existing real-world codebases (Instagram Server and other several CLI programs at Meta), while providing substantial improvements to startup time and memory usage.
The reference implementation shows small performance regressions in a few pyperformance benchmarks, but improvements in others. (TODO update with detailed data from 3.11 port of implementation.)
How to Teach This
Since the feature is opt-in, beginners should not encounter it by default.
Documentation of the
-L flag and
PYTHONLAZYIMPORTS environment variable
can clarify the behavior of lazy imports.
Some best practices to deal with some of the issues that could arise and to better take advantage of lazy imports are:
- Avoid relying on import side effects. Perhaps the most common reliance on import side effects is the registry pattern, where population of some external registry happens implicitly during the importing of modules, often via decorators. Instead, the registry should be built via an explicit call that perhaps does a discovery process to find decorated functions or classes.
- Always import needed submodules explicitly, don’t rely on some other import
to ensure a module has its submodules as attributes. That is, do
import foo.bar; foo.bar.Baz, not
import foo; foo.bar.Baz. The latter only works (unreliably) because the attribute
foo.baris added as a side effect of
foo.barbeing imported somewhere else. With lazy imports this may not always happen on time.
- Avoid using star imports, as those are always eager.
- When possible, do not import whole submodules. Import specific names instead;
from foo.bar import Baz, not
import foo.barand then
foo.bar.Baz. If you import submodules (such as
foo.fred), with lazy imports enabled, when you access the parent module’s name (
fooin this case), that will trigger loading all of the sibling submodules of the parent module (
foo.fred), not only the one being accessed, because the parent module
foois the actual deferred object name.
The current reference implementation is available as part of Cinder. Reference implementation is in use within Meta Platforms and has proven to achieve improvements in startup time (and total runtime for some applications) in the range of 40%-70%, as well as significant reduction in memory footprint (up to 40%), thanks to not needing to execute imports that end up being unused in the common flow.
A per-module opt-in using e.g.
from __future__ import lazy_imports has a
couple of disadvantages:
- It is less practical to achieve robust and significant startup-time or
memory-use wins by piecemeal application of lazy imports. Generally it would
require blanket application of the
__future__import to most of the codebase, as well as to third-party dependencies (which may be hard or impossible.)
__future__imports are not feature flags, they are for transition to behaviors which will become default in the future. It is not clear if lazy imports will ever make sense as the default behavior, so we should not promise this with a
__future__import. Thus, a per-module opt-in would require a new
from __optional_features__ import lazy_importsor similar mechanism.
Experience with the reference implementation suggests that the most practical adoption path for lazy imports is for a specific deployed application to opt-in globally, observe whether anything breaks, and opt-out specific modules as needed to account for e.g. reliance on import side effects.
Explicit syntax for lazy imports
If the primary objective of lazy imports were solely to work around import cycles and forward references, an explicitly-marked syntax for particular targeted imports to be lazy would make a lot of sense. But in practice it would be very hard to get robust startup time or memory use benefits from this approach, since it would require converting most imports within your code base (and in third-party dependencies) to use the lazy import syntax.
It would be possible to aim for a “shallow” laziness where only the top-level imports of subsystems from the main module are made explicitly lazy, but then imports within the subsystems are all eager. This is extremely fragile, though – it only takes one mis-placed import to undo the carefully constructed shallow laziness. Globally enabling lazy imports, on the other hand, provides in-depth robust laziness where you always pay only for the imports you use.
It would be possible to eagerly run the import loader to the point of finding the module source, but then defer the actual execution of the module and creation of the module object. The advantage of this would be that certain classes of import errors (e.g. a simple typo in the module name) would be caught eagerly instead of being deferred to the use of an imported name.
The disadvantage would be that the startup time benefits of lazy imports would
be significantly reduced, since unused imports would still require a filesystem
stat() call, at least. It would also introduce a possibly non-obvious split
between which import errors are raised eagerly and which are delayed, when
lazy imports are enabled.
This idea is rejected for now on the basis that in practice, confusion about import typos has not been an observed problem with the reference implementation. Generally delayed imports are not delayed forever, and errors show up soon enough to be caught and fixed (unless the import is truly unused.)
Lazy dynamic imports
It would be possible to add a
lazy=True or similar option to
importlib.import_module(), to enable them to
perform lazy imports. That idea is rejected in this PEP for lack of a clear
use case. Dynamic imports are already far outside the PEP 8 code style
recommendations for imports, and can easily be made precisely as lazy as
desired by placing them at the desired point in the code flow. These aren’t
commonly used at module top level, which is where lazy imports applies.
Deep eager-imports override
importlib.eager_imports() context manager and
importlib.set_eager_imports() override both have shallow effects: they only
force eagerness for the location where they are applied, not transitively. It
would be possible (although not simple) to provide a deep/transitive version of
one or both. That idea is rejected in this PEP because experience with the
reference implementation has not shown it to be necessary, and because it
prevents local reasoning about laziness of imports.
A deep override can lead to confusing behavior because the transitively-imported modules may be imported from multiple locations, some of which use the “deep eager override” and some of which don’t. Thus those modules may still be imported lazily initially, if they are first imported from a location that doesn’t have the override.
With deep overrides it is not possible to locally reason about whether a given import will be lazy or eager. With the behavior specified in this PEP, such local reasoning is possible.
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Last modified: 2022-05-21 20:04:08 GMT