PEP 329 – Treating Builtins as Constants in the Standard Library
- Author:
- Raymond Hettinger <python at rcn.com>
- Status:
- Rejected
- Type:
- Standards Track
- Created:
- 18-Apr-2004
- Python-Version:
- 2.4
- Post-History:
- 18-Apr-2004
Table of Contents
Abstract
The proposal is to add a function for treating builtin references as constants and to apply that function throughout the standard library.
Status
The PEP is self rejected by the author. Though the ASPN recipe was well received, there was less willingness to consider this for inclusion in the core distribution.
The Jython implementation does not use byte codes, so its performance
would suffer if the current _len=len
optimizations were removed.
Also, altering byte codes is one of the least clean ways to improve performance and enable cleaner coding. A more robust solution would likely involve compiler pragma directives or metavariables indicating what can be optimized (similar to const/volatile declarations).
Motivation
The library contains code such as _len=len
which is intended to
create fast local references instead of slower global lookups. Though
necessary for performance, these constructs clutter the code and are
usually incomplete (missing many opportunities).
If the proposal is adopted, those constructs could be eliminated from the code base and at the same time improve upon their results in terms of performance.
There are currently over a hundred instances of while 1
in the
library. They were not replaced with the more readable while True
because of performance reasons (the compiler cannot eliminate the test
because True
is not known to always be a constant). Conversion of
True to a constant will clarify the code while retaining performance.
Many other basic Python operations run much slower because of global
lookups. In try/except statements, the trapped exceptions are
dynamically looked up before testing whether they match.
Similarly, simple identity tests such as while x is not None
require the None
variable to be re-looked up on every pass.
Builtin lookups are especially egregious because the enclosing global
scope must be checked first. These lookup chains devour cache space
that is best used elsewhere.
In short, if the proposal is adopted, the code will become cleaner and performance will improve across the board.
Proposal
Add a module called codetweaks.py which contains two functions,
bind_constants()
and bind_all()
. The first function performs
constant binding and the second recursively applies it to every
function and class in a target module.
For most modules in the standard library, add a pair of lines near the end of the script:
import codetweaks, sys
codetweaks.bind_all(sys.modules[__name__])
In addition to binding builtins, there are some modules (like
sre_compile
) where it also makes sense to bind module variables
as well as builtins into constants.
Questions and Answers
- Will this make everyone divert their attention to optimization
issues?
Because it is done automatically, it reduces the need to think about optimizations.
- In a nutshell, how does it work?
Every function has attributes with its bytecodes (the language of the Python virtual machine) and a table of constants. The bind function scans the bytecodes for a
LOAD_GLOBAL
instruction and checks to see whether the value is already known. If so, it adds that value to the constants table and replaces the opcode withLOAD_CONSTANT
. - When does it work?
When a module is imported for the first time, python compiles the bytecode and runs the binding optimization. Subsequent imports just re-use the previous work. Each session repeats this process (the results are not saved in
pyc
files). - How do you know this works?
I implemented it, applied it to every module in library, and the test suite ran without exception.
- What if the module defines a variable shadowing a builtin?
This does happen. For instance, True can be redefined at the module level as
True = (1==1)
. The sample implementation below detects the shadowing and leaves the global lookup unchanged. - Are you the first person to recognize that most global lookups are for
values that never change?
No, this has long been known. Skip Montanaro provides an eloquent explanation in PEP 266.
- What if I want to replace the builtins module and supply my own
implementations?
Either do this before importing a module, or just reload the module, or disable
codetweaks.py
(it will have a disable flag). - How susceptible is this module to changes in Python’s byte coding?
It imports
opcode.py
to protect against renumbering. Also, it usesLOAD_CONST
andLOAD_GLOBAL
which are fundamental and have been around forever. That notwithstanding, the coding scheme could change and this implementation would have to change along with modules likedis
which also rely on the current coding scheme. - What is the effect on startup time?
I could not measure a difference. None of the startup modules are bound except for warnings.py. Also, the binding function is very fast, making just a single pass over the code string in search of the
LOAD_GLOBAL
opcode.
Sample Implementation
Here is a sample implementation for codetweaks.py:
from types import ClassType, FunctionType
from opcode import opmap, HAVE_ARGUMENT, EXTENDED_ARG
LOAD_GLOBAL, LOAD_CONST = opmap['LOAD_GLOBAL'], opmap['LOAD_CONST']
ABORT_CODES = (EXTENDED_ARG, opmap['STORE_GLOBAL'])
def bind_constants(f, builtin_only=False, stoplist=[], verbose=False):
""" Return a new function with optimized global references.
Replaces global references with their currently defined values.
If not defined, the dynamic (runtime) global lookup is left undisturbed.
If builtin_only is True, then only builtins are optimized.
Variable names in the stoplist are also left undisturbed.
If verbose is True, prints each substitution as is occurs.
"""
import __builtin__
env = vars(__builtin__).copy()
stoplist = dict.fromkeys(stoplist)
if builtin_only:
stoplist.update(f.func_globals)
else:
env.update(f.func_globals)
co = f.func_code
newcode = map(ord, co.co_code)
newconsts = list(co.co_consts)
codelen = len(newcode)
i = 0
while i < codelen:
opcode = newcode[i]
if opcode in ABORT_CODES:
return f # for simplicity, only optimize common cases
if opcode == LOAD_GLOBAL:
oparg = newcode[i+1] + (newcode[i+2] << 8)
name = co.co_names[oparg]
if name in env and name not in stoplist:
value = env[name]
try:
pos = newconsts.index(value)
except ValueError:
pos = len(newconsts)
newconsts.append(value)
newcode[i] = LOAD_CONST
newcode[i+1] = pos & 0xFF
newcode[i+2] = pos >> 8
if verbose:
print name, '-->', value
i += 1
if opcode >= HAVE_ARGUMENT:
i += 2
codestr = ''.join(map(chr, newcode))
codeobj = type(co)(co.co_argcount, co.co_nlocals, co.co_stacksize,
co.co_flags, codestr, tuple(newconsts), co.co_names,
co.co_varnames, co.co_filename, co.co_name,
co.co_firstlineno, co.co_lnotab, co.co_freevars,
co.co_cellvars)
return type(f)(codeobj, f.func_globals, f.func_name, f.func_defaults,
f.func_closure)
def bind_all(mc, builtin_only=False, stoplist=[], verbose=False):
"""Recursively apply bind_constants() to functions in a module or class.
Use as the last line of the module (after everything is defined, but
before test code).
In modules that need modifiable globals, set builtin_only to True.
"""
for k, v in vars(mc).items():
if type(v) is FunctionType:
newv = bind_constants(v, builtin_only, stoplist, verbose)
setattr(mc, k, newv)
elif type(v) in (type, ClassType):
bind_all(v, builtin_only, stoplist, verbose)
def f(): pass
try:
f.func_code.code
except AttributeError: # detect non-CPython environments
bind_all = lambda *args, **kwds: 0
del f
import sys
bind_all(sys.modules[__name__]) # Optimizer, optimize thyself!
Note the automatic detection of a non-CPython environment that does not have bytecodes [2]. In that situation, the bind functions would simply return the original function unchanged. This assures that the two line additions to library modules do not impact other implementations.
The final code should add a flag to make it easy to disable binding.
References
[1] ASPN Recipe for a non-private implementation https://code.activestate.com/recipes/277940/
Copyright
This document has been placed in the public domain.
Source: https://github.com/python/peps/blob/main/peps/pep-0329.rst
Last modified: 2023-09-09 17:39:29 GMT