PEP 266 – Optimizing Global Variable/Attribute Access
- Author:
- Skip Montanaro <skip at pobox.com>
- Status:
- Withdrawn
- Type:
- Standards Track
- Created:
- 13-Aug-2001
- Python-Version:
- 2.3
- Post-History:
Table of Contents
Abstract
The bindings for most global variables and attributes of other modules typically never change during the execution of a Python program, but because of Python’s dynamic nature, code which accesses such global objects must run through a full lookup each time the object is needed. This PEP proposes a mechanism that allows code that accesses most global objects to treat them as local objects and places the burden of updating references on the code that changes the name bindings of such objects.
Introduction
Consider the workhorse function sre_compile._compile
. It is the internal
compilation function for the sre
module. It consists almost entirely of a
loop over the elements of the pattern being compiled, comparing opcodes with
known constant values and appending tokens to an output list. Most of the
comparisons are with constants imported from the sre_constants
module.
This means there are lots of LOAD_GLOBAL
bytecodes in the compiled output
of this module. Just by reading the code it’s apparent that the author
intended LITERAL
, NOT_LITERAL
, OPCODES
and many other symbols to
be constants. Still, each time they are involved in an expression, they must
be looked up anew.
Most global accesses are actually to objects that are “almost constants”.
This includes global variables in the current module as well as the attributes
of other imported modules. Since they rarely change, it seems reasonable to
place the burden of updating references to such objects on the code that
changes the name bindings. If sre_constants.LITERAL
is changed to refer
to another object, perhaps it would be worthwhile for the code that modifies
the sre_constants
module dict to correct any active references to that
object. By doing so, in many cases global variables and the attributes of
many objects could be cached as local variables. If the bindings between the
names given to the objects and the objects themselves changes rarely, the cost
of keeping track of such objects should be low and the potential payoff fairly
large.
In an attempt to gauge the effect of this proposal, I modified the Pystone
benchmark program included in the Python distribution to cache global
functions. Its main function, Proc0
, makes calls to ten different
functions inside its for
loop. In addition, Func2
calls Func1
repeatedly inside a loop. If local copies of these 11 global identifiers are
made before the functions’ loops are entered, performance on this particular
benchmark improves by about two percent (from 5561 pystones to 5685 on my
laptop). It gives some indication that performance would be improved by
caching most global variable access. Note also that the pystone benchmark
makes essentially no accesses of global module attributes, an anticipated area
of improvement for this PEP.
Proposed Change
I propose that the Python virtual machine be modified to include
TRACK_OBJECT
and UNTRACK_OBJECT
opcodes. TRACK_OBJECT
would
associate a global name or attribute of a global name with a slot in the local
variable array and perform an initial lookup of the associated object to fill
in the slot with a valid value. The association it creates would be noted by
the code responsible for changing the name-to-object binding to cause the
associated local variable to be updated. The UNTRACK_OBJECT
opcode would
delete any association between the name and the local variable slot.
Threads
Operation of this code in threaded programs will be no different than in
unthreaded programs. If you need to lock an object to access it, you would
have had to do that before TRACK_OBJECT
would have been executed and
retain that lock until after you stop using it.
FIXME: I suspect I need more here.
Rationale
Global variables and attributes rarely change. For example, once a function
imports the math module, the binding between the name math and the
module it refers to aren’t likely to change. Similarly, if the function that
uses the math
module refers to its sin attribute, it’s unlikely to
change. Still, every time the module wants to call the math.sin
function,
it must first execute a pair of instructions:
LOAD_GLOBAL math
LOAD_ATTR sin
If the client module always assumed that math.sin
was a local constant and
it was the responsibility of “external forces” outside the function to keep
the reference correct, we might have code like this:
TRACK_OBJECT math.sin
...
LOAD_FAST math.sin
...
UNTRACK_OBJECT math.sin
If the LOAD_FAST
was in a loop the payoff in reduced global loads and
attribute lookups could be significant.
This technique could, in theory, be applied to any global variable access or attribute lookup. Consider this code:
l = []
for i in range(10):
l.append(math.sin(i))
return l
Even though l is a local variable, you still pay the cost of loading
l.append
ten times in the loop. The compiler (or an optimizer) could
recognize that both math.sin
and l.append
are being called in the loop
and decide to generate the tracked local code, avoiding it for the builtin
range()
function because it’s only called once during loop setup.
Performance issues related to accessing local variables make tracking
l.append
less attractive than tracking globals such as math.sin
.
According to a post to python-dev by Marc-Andre Lemburg [1], LOAD_GLOBAL
opcodes account for over 7% of all instructions executed by the Python virtual
machine. This can be a very expensive instruction, at least relative to a
LOAD_FAST
instruction, which is a simple array index and requires no extra
function calls by the virtual machine. I believe many LOAD_GLOBAL
instructions and LOAD_GLOBAL/LOAD_ATTR
pairs could be converted to
LOAD_FAST
instructions.
Code that uses global variables heavily often resorts to various tricks to
avoid global variable and attribute lookup. The aforementioned
sre_compile._compile
function caches the append
method of the growing
output list. Many people commonly abuse functions’ default argument feature
to cache global variable lookups. Both of these schemes are hackish and
rarely address all the available opportunities for optimization. (For
example, sre_compile._compile
does not cache the two globals that it uses
most frequently: the builtin len
function and the global OPCODES
array
that it imports from sre_constants.py
.
Questions
What about threads? What if math.sin
changes while in cache?
I believe the global interpreter lock will protect values from being
corrupted. In any case, the situation would be no worse than it is today.
If one thread modified math.sin
after another thread had already executed
LOAD_GLOBAL math
, but before it executed LOAD_ATTR sin
, the client
thread would see the old value of math.sin
.
The idea is this. I use a multi-attribute load below as an example, not
because it would happen very often, but because by demonstrating the recursive
nature with an extra call hopefully it will become clearer what I have in
mind. Suppose a function defined in module foo
wants to access
spam.eggs.ham
and that spam
is a module imported at the module level
in foo
:
import spam
...
def somefunc():
...
x = spam.eggs.ham
Upon entry to somefunc
, a TRACK_GLOBAL
instruction will be executed:
TRACK_GLOBAL spam.eggs.ham n
spam.eggs.ham is a string literal stored in the function’s constants
array. n is a fastlocals index. &fastlocals[n]
is a reference to
slot n in the executing frame’s fastlocals
array, the location in
which the spam.eggs.ham reference will be stored. Here’s what I envision
happening:
- The
TRACK_GLOBAL
instruction locates the object referred to by the name spam and finds it in its module scope. It then executes a C function like:_PyObject_TrackName(m, "spam.eggs.ham", &fastlocals[n])
where
m
is the module object with an attributespam
. - The module object strips the leading spam. and stores the necessary
information (eggs.ham and
&fastlocals[n]
) in case its binding for the name eggs changes. It then locates the object referred to by the key eggs in its dict and recursively calls:_PyObject_TrackName(eggs, "eggs.ham", &fastlocals[n])
- The
eggs
object strips the leading eggs., stores the (ham, &fastlocals[n]) info, locates the object in its namespace calledham
and calls_PyObject_TrackName
once again:_PyObject_TrackName(ham, "ham", &fastlocals[n])
- The
ham
object strips the leading string (no “.” this time, but that’s a minor point), sees that the result is empty, then uses its own value (self
, probably) to update the location it was handed:Py_XDECREF(&fastlocals[n]); &fastlocals[n] = self; Py_INCREF(&fastlocals[n]);
At this point, each object involved in resolving
spam.eggs.ham
knows which entry in its namespace needs to be tracked and what location to update if that name changes. Furthermore, if the one name it is tracking in its local storage changes, it can call_PyObject_TrackName
using the new object once the change has been made. At the bottom end of the food chain, the last object will always strip a name, see the empty string and know that its value should be stuffed into the location it’s been passed.When the object referred to by the dotted expression
spam.eggs.ham
is going to go out of scope, anUNTRACK_GLOBAL spam.eggs.ham n
instruction is executed. It has the effect of deleting all the tracking information thatTRACK_GLOBAL
established.The tracking operation may seem expensive, but recall that the objects being tracked are assumed to be “almost constant”, so the setup cost will be traded off against hopefully multiple local instead of global loads. For globals with attributes the tracking setup cost grows but is offset by avoiding the extra
LOAD_ATTR
cost. TheTRACK_GLOBAL
instruction needs to perform aPyDict_GetItemString
for the first name in the chain to determine where the top-level object resides. Each object in the chain has to store a string and an address somewhere, probably in a dict that uses storage locations as keys (e.g. the&fastlocals[n]
) and strings as values. (This dict could possibly be a central dict of dicts whose keys are object addresses instead of a per-object dict.) It shouldn’t be the other way around because multiple active frames may want to trackspam.eggs.ham
, but only one frame will want to associate that name with one of its fast locals slots.
Unresolved Issues
Threading
What about this (dumb) code?:
l = []
lock = threading.Lock()
...
def fill_l()::
for i in range(1000)::
lock.acquire()
l.append(math.sin(i))
lock.release()
...
def consume_l()::
while 1::
lock.acquire()
if l::
elt = l.pop()
lock.release()
fiddle(elt)
It’s not clear from a static analysis of the code what the lock is protecting.
(You can’t tell at compile-time that threads are even involved can you?)
Would or should it affect attempts to track l.append
or math.sin
in
the fill_l
function?
If we annotate the code with mythical track_object
and untrack_object
builtins (I’m not proposing this, just illustrating where stuff would go!), we
get:
l = []
lock = threading.Lock()
...
def fill_l()::
track_object("l.append", append)
track_object("math.sin", sin)
for i in range(1000)::
lock.acquire()
append(sin(i))
lock.release()
untrack_object("math.sin", sin)
untrack_object("l.append", append)
...
def consume_l()::
while 1::
lock.acquire()
if l::
elt = l.pop()
lock.release()
fiddle(elt)
Is that correct both with and without threads (or at least equally incorrect with and without threads)?
Nested Scopes
The presence of nested scopes will affect where TRACK_GLOBAL
finds a
global variable, but shouldn’t affect anything after that. (I think.)
Missing Attributes
Suppose I am tracking the object referred to by spam.eggs.ham
and
spam.eggs
is rebound to an object that does not have a ham
attribute.
It’s clear this will be an AttributeError
if the programmer attempts to
resolve spam.eggs.ham
in the current Python virtual machine, but suppose
the programmer has anticipated this case:
if hasattr(spam.eggs, "ham"):
print spam.eggs.ham
elif hasattr(spam.eggs, "bacon"):
print spam.eggs.bacon
else:
print "what? no meat?"
You can’t raise an AttributeError
when the tracking information is
recalculated. If it does not raise AttributeError
and instead lets the
tracking stand, it may be setting the programmer up for a very subtle error.
One solution to this problem would be to track the shortest possible root of
each dotted expression the function refers to directly. In the above example,
spam.eggs
would be tracked, but spam.eggs.ham
and spam.eggs.bacon
would not.
Who does the dirty work?
In the Questions section I postulated the existence of a
_PyObject_TrackName
function. While the API is fairly easy to specify,
the implementation behind-the-scenes is not so obvious. A central dictionary
could be used to track the name/location mappings, but it appears that all
setattr
functions might need to be modified to accommodate this new
functionality.
If all types used the PyObject_GenericSetAttr
function to set attributes
that would localize the update code somewhat. They don’t however (which is
not too surprising), so it seems that all getattrfunc
and getattrofunc
functions will have to be updated. In addition, this would place an absolute
requirement on C extension module authors to call some function when an
attribute changes value (PyObject_TrackUpdate
?).
Finally, it’s quite possible that some attributes will be set by side effect
and not by any direct call to a setattr
method of some sort. Consider a
device interface module that has an interrupt routine that copies the contents
of a device register into a slot in the object’s struct
whenever it
changes. In these situations, more extensive modifications would have to be
made by the module author. To identify such situations at compile time would
be impossible. I think an extra slot could be added to PyTypeObjects
to
indicate if an object’s code is safe for global tracking. It would have a
default value of 0 (Py_TRACKING_NOT_SAFE
). If an extension module author
has implemented the necessary tracking support, that field could be
initialized to 1 (Py_TRACKING_SAFE
). _PyObject_TrackName
could check
that field and issue a warning if it is asked to track an object that the
author has not explicitly said was safe for tracking.
Discussion
Jeremy Hylton has an alternate proposal on the table [2]. His proposal seeks to create a hybrid dictionary/list object for use in global name lookups that would make global variable access look more like local variable access. While there is no C code available to examine, the Python implementation given in his proposal still appears to require dictionary key lookup. It doesn’t appear that his proposal could speed local variable attribute lookup, which might be worthwhile in some situations if potential performance burdens could be addressed.
Backwards Compatibility
I don’t believe there will be any serious issues of backward compatibility.
Obviously, Python bytecode that contains TRACK_OBJECT
opcodes could not be
executed by earlier versions of the interpreter, but breakage at the bytecode
level is often assumed between versions.
Implementation
TBD. This is where I need help. I believe there should be either a central
name/location registry or the code that modifies object attributes should be
modified, but I’m not sure the best way to go about this. If you look at the
code that implements the STORE_GLOBAL
and STORE_ATTR
opcodes, it seems
likely that some changes will be required to PyDict_SetItem
and
PyObject_SetAttr
or their String variants. Ideally, there’d be a fairly
central place to localize these changes. If you begin considering tracking
attributes of local variables you get into issues of modifying STORE_FAST
as well, which could be a problem, since the name bindings for local variables
are changed much more frequently. (I think an optimizer could avoid inserting
the tracking code for the attributes for any local variables where the
variable’s name binding changes.)
Performance
I believe (though I have no code to prove it at this point), that implementing
TRACK_OBJECT
will generally not be much more expensive than a single
LOAD_GLOBAL
instruction or a LOAD_GLOBAL
/LOAD_ATTR
pair. An
optimizer should be able to avoid converting LOAD_GLOBAL
and
LOAD_GLOBAL
/LOAD_ATTR
to the new scheme unless the object access
occurred within a loop. Further down the line, a register-oriented
replacement for the current Python virtual machine [3] could conceivably
eliminate most of the LOAD_FAST
instructions as well.
The number of tracked objects should be relatively small. All active frames of all active threads could conceivably be tracking objects, but this seems small compared to the number of functions defined in a given application.
References
Copyright
This document has been placed in the public domain.
Source: https://github.com/python/peps/blob/main/peps/pep-0266.rst
Last modified: 2023-09-09 17:39:29 GMT