PEP: 273 Title: Import Modules from Zip Archives Author: James C.
Ahlstrom <jim@interet.com> Status: Final Type: Standards Track
Content-Type: text/x-rst Created: 11-Oct-2001 Python-Version: 2.3
Post-History: 26-Oct-2001

Abstract

This PEP adds the ability to import Python modules *.py, *.py[co] and
packages from zip archives. The same code is used to speed up normal
directory imports provided os.listdir is available.

Note

Zip imports were added to Python 2.3, but the final implementation uses
an approach different from the one described in this PEP. The 2.3
implementation is SourceForge patch #652586[1], which adds new import
hooks described in PEP 302.

The rest of this PEP is therefore only of historical interest.

Specification

Currently, sys.path is a list of directory names as strings. If this PEP
is implemented, an item of sys.path can be a string naming a zip file
archive. The zip archive can contain a subdirectory structure to support
package imports. The zip archive satisfies imports exactly as a
subdirectory would.

The implementation is in C code in the Python core and works on all
supported Python platforms.

Any files may be present in the zip archive, but only files *.py and
*.py[co] are available for import. Zip import of dynamic modules (*.pyd,
*.so) is disallowed.

Just as sys.path currently has default directory names, a default zip
archive name is added too. Otherwise there is no way to import all
Python library files from an archive.

Subdirectory Equivalence

The zip archive must be treated exactly as a subdirectory tree so we can
support package imports based on current and future rules. All zip data
is taken from the Central Directory, the data must be correct, and brain
dead zip files are not accommodated.

Suppose sys.path contains "/A/B/SubDir" and "/C/D/E/Archive.zip", and we
are trying to import modfoo from the Q package. Then import.c will
generate a list of paths and extensions and will look for the file. The
list of generated paths does not change for zip imports. Suppose
import.c generates the path "/A/B/SubDir/Q/R/modfoo.pyc". Then it will
also generate the path "/C/D/E/Archive.zip/Q/R/modfoo.pyc". Finding the
SubDir path is exactly equivalent to finding "Q/R/modfoo.pyc" in the
archive.

Suppose you zip up /A/B/SubDir/* and all its subdirectories. Then your
zip file will satisfy imports just as your subdirectory did.

Well, not quite. You can't satisfy dynamic modules from a zip file.
Dynamic modules have extensions like .dll, .pyd, and .so. They are
operating system dependent, and probably can't be loaded except from a
file. It might be possible to extract the dynamic module from the zip
file, write it to a plain file and load it. But that would mean creating
temporary files, and dealing with all the dynload_*.c, and that's
probably not a good idea.

When trying to import *.pyc, if it is not available then *.pyo will be
used instead. And vice versa when looking for *.pyo. If neither *.pyc
nor *.pyo is available, or if the magic numbers are invalid, then *.py
will be compiled and used to satisfy the import, but the compiled file
will not be saved. Python would normally write it to the same directory
as *.py, but surely we don't want to write to the zip file. We could
write to the directory of the zip archive, but that would clutter it up,
not good if it is /usr/bin for example.

Failing to write the compiled files will make zip imports very slow, and
the user will probably not figure out what is wrong. So it is best to
put *.pyc and *.pyo in the archive with the *.py.

Efficiency

The only way to find files in a zip archive is linear search. So for
each zip file in sys.path, we search for its names once, and put the
names plus other relevant data into a static Python dictionary. The key
is the archive name from sys.path joined with the file name (including
any subdirectories) within the archive. This is exactly the name
generated by import.c, and makes lookup easy.

This same mechanism is used to speed up directory (non-zip) imports. See
below.

zlib

Compressed zip archives require zlib for decompression. Prior to any
other imports, we attempt an import of zlib. Import of compressed files
will fail with a message "missing zlib" unless zlib is available.

Booting

Python imports site.py itself, and this imports os, nt, ntpath, stat,
and UserDict. It also imports sitecustomize.py which may import more
modules. Zip imports must be available before site.py is imported.

Just as there are default directories in sys.path, there must be one or
more default zip archives too.

The problem is what the name should be. The name should be linked with
the Python version, so the Python executable can correctly find its
corresponding libraries even when there are multiple Python versions on
the same machine.

We add one name to sys.path. On Unix, the directory is
sys.prefix + "/lib", and the file name is
"python%s%s.zip" % (sys.version[0], sys.version[2]). So for Python 2.2
and prefix /usr/local, the path /usr/local/lib/python2.2/ is already on
sys.path, and /usr/local/lib/python22.zip would be added. On Windows,
the file is the full path to python22.dll, with "dll" replaced by "zip".
The zip archive name is always inserted as the second item in sys.path.
The first is the directory of the main.py (thanks Tim).

Directory Imports

The static Python dictionary used to speed up zip imports can be used to
speed up normal directory imports too. For each item in sys.path that is
not a zip archive, we call os.listdir, and add the directory contents to
the dictionary. Then instead of calling fopen() in a double loop, we
just check the dictionary. This greatly speeds up imports. If os.listdir
doesn't exist, the dictionary is not used.

Benchmarks

  Case   Original 2.2a3      Using os.listdir    Zip Uncomp   Zip Compr
  ------ ------------------- ------------------- ------------ ------------
  1      3.2 2.5 3.2->1.02   2.3 2.5 2.3->0.87   1.66->0.93   1.5->1.07
  2      2.8 3.9 3.0->1.32   Same as Case 1.                  
  3      5.7 5.7 5.7->5.7    2.1 2.1 2.1->1.8    1.25->0.99   1.19->1.13
  4      9.4 9.4 9.3->9.35   Same as Case 3.                  

Case 1: Local drive C:, sys.path has its default value. Case 2: Local
drive C:, directory with files is at the end of sys.path. Case 3:
Network drive, sys.path has its default value. Case 4: Network drive,
directory with files is at the end of sys.path.

Benchmarks were performed on a Pentium 4 clone, 1.4 GHz, 256 Meg. The
machine was running Windows 2000 with a Linux/Samba network server.
Times are in seconds, and are the time to import about 100 Lib modules.
Case 2 and 4 have the "correct" directory moved to the end of sys.path.
"Uncomp" means uncompressed zip archive, "Compr" means compressed.

Initial times are after a re-boot of the system; the time after "->" is
the time after repeated runs. Times to import from C: after a re-boot
are rather highly variable for the "Original" case, but are more
realistic.

Custom Imports

The logic demonstrates the ability to import using default searching
until a needed Python module (in this case, os) becomes available. This
can be used to bootstrap custom importers. For example, if "importer()"
in __init__.py exists, then it could be used for imports. The
"importer()" can freely import os and other modules, and these will be
satisfied from the default mechanism. This PEP does not define any
custom importers, and this note is for information only.

Implementation

A C implementation is available as SourceForge patch 492105. Superseded
by patch 652586 and current CVS.[2]

A newer version (updated for recent CVS by Paul Moore) is 645650.
Superseded by patch 652586 and current CVS.[3]

A competing implementation by Just van Rossum is 652586, which is the
basis for the final implementation of PEP 302. PEP 273 has been
implemented using PEP 302's import hooks.[4]

References

Copyright

This document has been placed in the public domain.

[1] Just van Rossum, New import hooks + Import from Zip files
https://bugs.python.org/issue652586

[2] Import from Zip archive, James C. Ahlstrom
https://bugs.python.org/issue492105

[3] Import from Zip Archive, Paul Moore
https://bugs.python.org/issue645650

[4] Just van Rossum, New import hooks + Import from Zip files
https://bugs.python.org/issue652586