PEP: 329 Title: Treating Builtins as Constants in the Standard Library
Version: $Revision$ Last-Modified: $Date$ Author: Raymond Hettinger
<python@rcn.com> Status: Rejected Type: Standards Track Content-Type:
text/x-rst Created: 18-Apr-2004 Python-Version: 2.4 Post-History:
18-Apr-2004

Abstract

The proposal is to add a function for treating builtin references as
constants and to apply that function throughout the standard library.

Status

The PEP is self rejected by the author. Though the ASPN recipe was well
received, there was less willingness to consider this for inclusion in
the core distribution.

The Jython implementation does not use byte codes, so its performance
would suffer if the current _len=len optimizations were removed.

Also, altering byte codes is one of the least clean ways to improve
performance and enable cleaner coding. A more robust solution would
likely involve compiler pragma directives or metavariables indicating
what can be optimized (similar to const/volatile declarations).

Motivation

The library contains code such as _len=len which is intended to create
fast local references instead of slower global lookups. Though necessary
for performance, these constructs clutter the code and are usually
incomplete (missing many opportunities).

If the proposal is adopted, those constructs could be eliminated from
the code base and at the same time improve upon their results in terms
of performance.

There are currently over a hundred instances of while 1 in the library.
They were not replaced with the more readable while True because of
performance reasons (the compiler cannot eliminate the test because True
is not known to always be a constant). Conversion of True to a constant
will clarify the code while retaining performance.

Many other basic Python operations run much slower because of global
lookups. In try/except statements, the trapped exceptions are
dynamically looked up before testing whether they match. Similarly,
simple identity tests such as while x is not None require the None
variable to be re-looked up on every pass. Builtin lookups are
especially egregious because the enclosing global scope must be checked
first. These lookup chains devour cache space that is best used
elsewhere.

In short, if the proposal is adopted, the code will become cleaner and
performance will improve across the board.

Proposal

Add a module called codetweaks.py which contains two functions,
bind_constants() and bind_all(). The first function performs constant
binding and the second recursively applies it to every function and
class in a target module.

For most modules in the standard library, add a pair of lines near the
end of the script:

    import codetweaks, sys
    codetweaks.bind_all(sys.modules[__name__])

In addition to binding builtins, there are some modules (like
sre_compile) where it also makes sense to bind module variables as well
as builtins into constants.

Questions and Answers

1.  Will this make everyone divert their attention to optimization
    issues?

    Because it is done automatically, it reduces the need to think about
    optimizations.

2.  In a nutshell, how does it work?

    Every function has attributes with its bytecodes (the language of
    the Python virtual machine) and a table of constants. The bind
    function scans the bytecodes for a LOAD_GLOBAL instruction and
    checks to see whether the value is already known. If so, it adds
    that value to the constants table and replaces the opcode with
    LOAD_CONSTANT.

3.  When does it work?

    When a module is imported for the first time, python compiles the
    bytecode and runs the binding optimization. Subsequent imports just
    re-use the previous work. Each session repeats this process (the
    results are not saved in pyc files).

4.  How do you know this works?

    I implemented it, applied it to every module in library, and the
    test suite ran without exception.

5.  What if the module defines a variable shadowing a builtin?

    This does happen. For instance, True can be redefined at the module
    level as True = (1==1). The sample implementation below detects the
    shadowing and leaves the global lookup unchanged.

6.  Are you the first person to recognize that most global lookups are
    for values that never change?

    No, this has long been known. Skip Montanaro provides an eloquent
    explanation in PEP 266.

7.  What if I want to replace the builtins module and supply my own
    implementations?

    Either do this before importing a module, or just reload the module,
    or disable codetweaks.py (it will have a disable flag).

8.  How susceptible is this module to changes in Python's byte coding?

    It imports opcode.py to protect against renumbering. Also, it uses
    LOAD_CONST and LOAD_GLOBAL which are fundamental and have been
    around forever. That notwithstanding, the coding scheme could change
    and this implementation would have to change along with modules like
    dis which also rely on the current coding scheme.

9.  What is the effect on startup time?

    I could not measure a difference. None of the startup modules are
    bound except for warnings.py. Also, the binding function is very
    fast, making just a single pass over the code string in search of
    the LOAD_GLOBAL opcode.

Sample Implementation

Here is a sample implementation for codetweaks.py:

    from types import ClassType, FunctionType
    from opcode import opmap, HAVE_ARGUMENT, EXTENDED_ARG
    LOAD_GLOBAL, LOAD_CONST = opmap['LOAD_GLOBAL'], opmap['LOAD_CONST']
    ABORT_CODES = (EXTENDED_ARG, opmap['STORE_GLOBAL'])

    def bind_constants(f, builtin_only=False, stoplist=[], verbose=False):
        """ Return a new function with optimized global references.

        Replaces global references with their currently defined values.
        If not defined, the dynamic (runtime) global lookup is left undisturbed.
        If builtin_only is True, then only builtins are optimized.
        Variable names in the stoplist are also left undisturbed.
        If verbose is True, prints each substitution as is occurs.

        """
        import __builtin__
        env = vars(__builtin__).copy()
        stoplist = dict.fromkeys(stoplist)
        if builtin_only:
            stoplist.update(f.func_globals)
        else:
            env.update(f.func_globals)

        co = f.func_code
        newcode = map(ord, co.co_code)
        newconsts = list(co.co_consts)
        codelen = len(newcode)

        i = 0
        while i < codelen:
            opcode = newcode[i]
            if opcode in ABORT_CODES:
                return f    # for simplicity, only optimize common cases
            if opcode == LOAD_GLOBAL:
                oparg = newcode[i+1] + (newcode[i+2] << 8)
                name = co.co_names[oparg]
                if name in env and name not in stoplist:
                    value = env[name]
                    try:
                        pos = newconsts.index(value)
                    except ValueError:
                        pos = len(newconsts)
                        newconsts.append(value)
                    newcode[i] = LOAD_CONST
                    newcode[i+1] = pos & 0xFF
                    newcode[i+2] = pos >> 8
                    if verbose:
                        print name, '-->', value
            i += 1
            if opcode >= HAVE_ARGUMENT:
                i += 2

        codestr = ''.join(map(chr, newcode))
        codeobj = type(co)(co.co_argcount, co.co_nlocals, co.co_stacksize,
                        co.co_flags, codestr, tuple(newconsts), co.co_names,
                        co.co_varnames, co.co_filename, co.co_name,
                        co.co_firstlineno, co.co_lnotab, co.co_freevars,
                        co.co_cellvars)
        return type(f)(codeobj, f.func_globals, f.func_name, f.func_defaults,
                        f.func_closure)


    def bind_all(mc, builtin_only=False, stoplist=[], verbose=False):
        """Recursively apply bind_constants() to functions in a module or class.

        Use as the last line of the module (after everything is defined, but
        before test code).

        In modules that need modifiable globals, set builtin_only to True.

        """
        for k, v in vars(mc).items():
            if type(v) is FunctionType:
                newv = bind_constants(v, builtin_only, stoplist, verbose)
                setattr(mc, k, newv)
            elif type(v) in (type, ClassType):
                bind_all(v, builtin_only, stoplist, verbose)


    def f(): pass
    try:
        f.func_code.code
    except AttributeError:                  # detect non-CPython environments
        bind_all = lambda *args, **kwds: 0
    del f

    import sys
    bind_all(sys.modules[__name__])         # Optimizer, optimize thyself!

Note the automatic detection of a non-CPython environment that does not
have bytecodes[1]. In that situation, the bind functions would simply
return the original function unchanged. This assures that the two line
additions to library modules do not impact other implementations.

The final code should add a flag to make it easy to disable binding.

References

[1] ASPN Recipe for a non-private implementation
https://code.activestate.com/recipes/277940/

Copyright

This document has been placed in the public domain.

[1] Differences between CPython and Jython
https://web.archive.org/web/20031018014238/http://www.jython.org/cgi-bin/faqw.py?req=show&file=faq01.003.htp