PEP: 304 Title: Controlling Generation of Bytecode Files Version:
$Revision$ Last-Modified: $Date$ Author: Skip Montanaro Status:
Withdrawn Type: Standards Track Content-Type: text/x-rst Created:
22-Jan-2003 Post-History: 27-Jan-2003, 31-Jan-2003, 17-Jun-2005

Historical Note

While this original PEP was withdrawn, a variant of this feature was
eventually implemented for Python 3.8 in
https://bugs.python.org/issue33499

Several of the issues and concerns originally raised in this PEP were
resolved by other changes in the intervening years:

-   the introduction of isolated mode to handle potential security
    concerns
-   the switch to importlib, a fully import-hook based import system
    implementation
-   PEP 3147's change in the bytecode cache layout to use __pycache__
    subdirectories, including the source_to_cache(path) and
    cache_to_source(path) APIs that allow the interpreter to
    automatically handle the redirection to a separate cache directory

Abstract

This PEP outlines a mechanism for controlling the generation and
location of compiled Python bytecode files. This idea originally arose
as a patch request[1] and evolved into a discussion thread on the
python-dev mailing list[2]. The introduction of an environment variable
will allow people installing Python or Python-based third-party packages
to control whether or not bytecode files should be generated at
installation time, and if so, where they should be written. It will also
allow users to control whether or not bytecode files should be generated
at application run-time, and if so, where they should be written.

Proposal

Add a new environment variable, PYTHONBYTECODEBASE, to the mix of
environment variables which Python understands. PYTHONBYTECODEBASE is
interpreted as follows:

-   If not defined, Python bytecode is generated in exactly the same way
    as is currently done. sys.bytecodebase is set to the root directory
    (either / on Unix and Mac OSX or the root directory of the startup
    (installation???) drive -- typically C:\ -- on Windows).

-   If defined and it refers to an existing directory to which the user
    has write permission, sys.bytecodebase is set to that directory and
    bytecode files are written into a directory structure rooted at that
    location.

-   If defined but empty, sys.bytecodebase is set to None and generation
    of bytecode files is suppressed altogether.

-   If defined and one of the following is true:

    -   it does not refer to a directory,
    -   it refers to a directory, but not one for which the user has
        write permission

    a warning is displayed, sys.bytecodebase is set to None and
    generation of bytecode files is suppressed altogether.

After startup initialization, all runtime references are to
sys.bytecodebase, not the PYTHONBYTECODEBASE environment variable.
sys.path is not modified.

From the above, we see sys.bytecodebase can only take on two valid types
of values: None or a string referring to a valid directory on the
system.

During import, this extension works as follows:

-   The normal search for a module is conducted. The search order is
    roughly: dynamically loaded extension module, Python source file,
    Python bytecode file. The only time this mechanism comes into play
    is if a Python source file is found.
-   Once we've found a source module, an attempt to read a byte-compiled
    file in the same directory is made. (This is the same as before.)
-   If no byte-compiled file is found, an attempt to read a
    byte-compiled file from the augmented directory is made.
-   If bytecode generation is required, the generated bytecode is
    written to the augmented directory if possible.

Note that this PEP is explicitly not about providing module-by-module or
directory-by-directory control over the disposition of bytecode files.

Glossary

-   "bytecode base" refers to the current setting of sys.bytecodebase.
-   "augmented directory" refers to the directory formed from the
    bytecode base and the directory name of the source file.
-   PYTHONBYTECODEBASE refers to the environment variable when necessary
    to distinguish it from "bytecode base".

Locating bytecode files

When the interpreter is searching for a module, it will use sys.path as
usual. However, when a possible bytecode file is considered, an extra
probe for a bytecode file may be made. First, a check is made for the
bytecode file using the directory in sys.path which holds the source
file (the current behavior). If a valid bytecode file is not found there
(either one does not exist or exists but is out-of-date) and the
bytecode base is not None, a second probe is made using the directory in
sys.path prefixed appropriately by the bytecode base.

Writing bytecode files

When the bytecode base is not None, a new bytecode file is written to
the appropriate augmented directory, never directly to a directory in
sys.path.

Defining augmented directories

Conceptually, the augmented directory for a bytecode file is the
directory in which the source file exists prefixed by the bytecode base.
In a Unix environment this would be:

    pcb = os.path.abspath(sys.bytecodebase)
    if sourcefile[0] == os.sep: sourcefile = sourcefile[1:]
    augdir = os.path.join(pcb, os.path.dirname(sourcefile))

On Windows, which does not have a single-rooted directory tree, the
drive letter of the directory containing the source file is treated as a
directory component after removing the trailing colon. The augmented
directory is thus derived as :

    pcb = os.path.abspath(sys.bytecodebase)
    drive, base = os.path.splitdrive(os.path.dirname(sourcefile))
    drive = drive[:-1]
    if base[0] == "\\": base = base[1:]
    augdir = os.path.join(pcb, drive, base)

Fixing the location of the bytecode base

During program startup, the value of the PYTHONBYTECODEBASE environment
variable is made absolute, checked for validity and added to the sys
module, effectively:

    pcb = os.path.abspath(os.environ["PYTHONBYTECODEBASE"])
    probe = os.path.join(pcb, "foo")
    try:
        open(probe, "w")
    except IOError:
        sys.bytecodebase = None
    else:
        os.unlink(probe)
        sys.bytecodebase = pcb

This allows the user to specify the bytecode base as a relative path,
but not have it subject to changes to the current working directory
during program execution. (I can't imagine you'd want it to move around
during program execution.)

There is nothing special about sys.bytecodebase. The user may change it
at runtime if desired, but normally it will not be modified.

Rationale

In many environments it is not possible for non-root users to write into
directories containing Python source files. Most of the time, this is
not a problem as Python source is generally byte compiled during
installation. However, there are situations where bytecode files are
either missing or need to be updated. If the directory containing the
source file is not writable by the current user a performance penalty is
incurred each time a program importing the module is run.[3] Warning
messages may also be generated in certain circumstances. If the
directory is writable, nearly simultaneous attempts to write the
bytecode file by two separate processes may occur, resulting in file
corruption.[4]

In environments with RAM disks available, it may be desirable for
performance reasons to write bytecode files to a directory on such a
disk. Similarly, in environments where Python source code resides on
network file systems, it may be desirable to cache bytecode files on
local disks.

Alternatives

The only other alternative proposed so far[5] seems to be to add a -R
flag to the interpreter to disable writing bytecode files altogether.
This proposal subsumes that. Adding a command-line option is certainly
possible, but is probably not sufficient, as the interpreter's command
line is not readily available during installation (early during program
startup???).

Issues

-   Interpretation of a module's __file__ attribute. I believe the
    __file__ attribute of a module should reflect the true location of
    the bytecode file. If people want to locate a module's source code,
    they should use imp.find_module(module).
-   Security - What if root has PYTHONBYTECODEBASE set? Yes, this can
    present a security risk, but so can many other things the root user
    does. The root user should probably not set PYTHONBYTECODEBASE
    except possibly during installation. Still, perhaps this problem can
    be minimized. When running as root the interpreter should check to
    see if PYTHONBYTECODEBASE refers to a directory which is writable by
    anyone other than root. If so, it could raise an exception or
    warning and set sys.bytecodebase to None. Or, see the next item.
-   More security - What if PYTHONBYTECODEBASE refers to a general
    directory (say, /tmp)? In this case, perhaps loading of a
    preexisting bytecode file should occur only if the file is owned by
    the current user or root. (Does this matter on Windows?)
-   The interaction of this PEP with import hooks has not been
    considered yet. In fact, the best way to implement this idea might
    be as an import hook. See PEP 302.
-   In the current (pre-PEP 304) environment, it is safe to delete a
    source file after the corresponding bytecode file has been created,
    since they reside in the same directory. With PEP 304 as currently
    defined, this is not the case. A bytecode file in the augmented
    directory is only considered when the source file is present and it
    thus never considered when looking for module files ending in
    ".pyc". I think this behavior may have to change.

Examples

In the examples which follow, the urllib source code resides in
/usr/lib/python2.3/urllib.py and /usr/lib/python2.3 is in sys.path but
is not writable by the current user.

-   The bytecode base is /tmp. /usr/lib/python2.3/urllib.pyc exists and
    is valid. When urllib is imported, the contents of
    /usr/lib/python2.3/urllib.pyc are used. The augmented directory is
    not consulted. No other bytecode file is generated.
-   The bytecode base is /tmp. /usr/lib/python2.3/urllib.pyc exists, but
    is out-of-date. When urllib is imported, the generated bytecode file
    is written to urllib.pyc in the augmented directory which has the
    value /tmp/usr/lib/python2.3. Intermediate directories will be
    created as needed.
-   The bytecode base is None. No urllib.pyc file is found. When urllib
    is imported, no bytecode file is written.
-   The bytecode base is /tmp. No urllib.pyc file is found. When urllib
    is imported, the generated bytecode file is written to the augmented
    directory which has the value /tmp/usr/lib/python2.3. Intermediate
    directories will be created as needed.
-   At startup, PYTHONBYTECODEBASE is /tmp/foobar, which does not exist.
    A warning is emitted, sys.bytecodebase is set to None and no
    bytecode files are written during program execution unless
    sys.bytecodebase is later changed to refer to a valid, writable
    directory.
-   At startup, PYTHONBYTECODEBASE is set to /, which exists, but is not
    writable by the current user. A warning is emitted, sys.bytecodebase
    is set to None and no bytecode files are written during program
    execution unless sys.bytecodebase is later changed to refer to a
    valid, writable directory. Note that even though the augmented
    directory constructed for a particular bytecode file may be writable
    by the current user, what counts is that the bytecode base directory
    itself is writable.
-   At startup PYTHONBYTECODEBASE is set to the empty string.
    sys.bytecodebase is set to None. No warning is generated, however.
    If no urllib.pyc file is found when urllib is imported, no bytecode
    file is written.

In the Windows examples which follow, the urllib source code resides in
C:\PYTHON22\urllib.py. C:\PYTHON22 is in sys.path but is not writable by
the current user.

-   The bytecode base is set to C:\TEMP. C:\PYTHON22\urllib.pyc exists
    and is valid. When urllib is imported, the contents of
    C:\PYTHON22\urllib.pyc are used. The augmented directory is not
    consulted.
-   The bytecode base is set to C:\TEMP. C:\PYTHON22\urllib.pyc exists,
    but is out-of-date. When urllib is imported, a new bytecode file is
    written to the augmented directory which has the value
    C:\TEMP\C\PYTHON22. Intermediate directories will be created as
    needed.
-   At startup PYTHONBYTECODEBASE is set to TEMP and the current working
    directory at application startup is H:\NET. The potential bytecode
    base is thus H:\NET\TEMP. If this directory exists and is writable
    by the current user, sys.bytecodebase will be set to that value. If
    not, a warning will be emitted and sys.bytecodebase will be set to
    None.
-   The bytecode base is C:\TEMP. No urllib.pyc file is found. When
    urllib is imported, the generated bytecode file is written to the
    augmented directory which has the value C:\TEMP\C\PYTHON22.
    Intermediate directories will be created as needed.

Implementation

See the patch on Sourceforge.[6]

References

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 End:

[1] patch 602345, Option for not writing py.[co] files, Klose
(https://bugs.python.org/issue602345)

[2] python-dev thread, Disable writing .py[co], Norwitz
(https://mail.python.org/pipermail/python-dev/2003-January/032270.html)

[3] Debian bug report, Mailman is writing to /usr in cron, Wegner
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=96111)

[4] python-dev thread, Parallel pyc construction, Dubois
(https://mail.python.org/pipermail/python-dev/2003-January/032060.html)

[5] patch 602345, Option for not writing py.[co] files, Klose
(https://bugs.python.org/issue602345)

[6] patch 677103, PYTHONBYTECODEBASE patch (PEP 304), Montanaro
(https://bugs.python.org/issue677103)