PEP: 267 Title: Optimized Access to Module Namespaces Author: Jeremy
Hylton <jeremy@alum.mit.edu> Status: Deferred Type: Standards Track
Content-Type: text/x-rst Created: 23-May-2001 Python-Version: 2.2
Post-History:

Deferral

While this PEP is a nice idea, no-one has yet emerged to do the work of
hashing out the differences between this PEP, PEP 266 and PEP 280.
Hence, it is being deferred.

Abstract

This PEP proposes a new implementation of global module namespaces and
the builtin namespace that speeds name resolution. The implementation
would use an array of object pointers for most operations in these
namespaces. The compiler would assign indices for global variables and
module attributes at compile time.

The current implementation represents these namespaces as dictionaries.
A global name incurs a dictionary lookup each time it is used; a builtin
name incurs two dictionary lookups, a failed lookup in the global
namespace and a second lookup in the builtin namespace.

This implementation should speed Python code that uses module-level
functions and variables. It should also eliminate awkward coding styles
that have evolved to speed access to these names.

The implementation is complicated because the global and builtin
namespaces can be modified dynamically in ways that are impossible for
the compiler to detect. (Example: A module's namespace is modified by a
script after the module is imported.) As a result, the implementation
must maintain several auxiliary data structures to preserve these
dynamic features.

Introduction

This PEP proposes a new implementation of attribute access for module
objects that optimizes access to module variables known at compile time.
The module will store these variables in an array and provide an
interface to lookup attributes using array offsets. For globals,
builtins, and attributes of imported modules, the compiler will generate
code that uses the array offsets for fast access.

[describe the key parts of the design: dlict, compiler support, stupid
name trick workarounds, optimization of other module's globals]

The implementation will preserve existing semantics for module
namespaces, including the ability to modify module namespaces at runtime
in ways that affect the visibility of builtin names.

DLict design

The namespaces are implemented using a data structure that has sometimes
gone under the name dlict. It is a dictionary that has numbered slots
for some dictionary entries. The type must be implemented in C to
achieve acceptable performance. The new type-class unification work
should make this fairly easy. The DLict will presumably be a subclass of
dictionary with an alternate storage module for some keys.

A Python implementation is included here to illustrate the basic design:

    """A dictionary-list hybrid"""

    import types

    class DLict:
        def __init__(self, names):
            assert isinstance(names, types.DictType)
            self.names = {}
            self.list = [None] * size
            self.empty = [1] * size
            self.dict = {}
            self.size = 0

        def __getitem__(self, name):
            i = self.names.get(name)
            if i is None:
                return self.dict[name]
            if self.empty[i] is not None:
                raise KeyError, name
            return self.list[i]

        def __setitem__(self, name, val):
            i = self.names.get(name)
            if i is None:
                self.dict[name] = val
            else:
                self.empty[i] = None
                self.list[i] = val
                self.size += 1

        def __delitem__(self, name):
            i = self.names.get(name)
            if i is None:
                del self.dict[name]
            else:
                if self.empty[i] is not None:
                    raise KeyError, name
                self.empty[i] = 1
                self.list[i] = None
                self.size -= 1

        def keys(self):
            if self.dict:
                return self.names.keys() + self.dict.keys()
            else:
                return self.names.keys()

        def values(self):
            if self.dict:
                return self.names.values() + self.dict.values()
            else:
                return self.names.values()

        def items(self):
            if self.dict:
                return self.names.items()
            else:
                return self.names.items() + self.dict.items()

        def __len__(self):
            return self.size + len(self.dict)

        def __cmp__(self, dlict):
            c = cmp(self.names, dlict.names)
            if c != 0:
                return c
            c = cmp(self.size, dlict.size)
            if c != 0:
                return c
            for i in range(len(self.names)):
                c = cmp(self.empty[i], dlict.empty[i])
            if c != 0:
                return c
            if self.empty[i] is None:
                c = cmp(self.list[i], dlict.empty[i])
                if c != 0:
                    return c
            return cmp(self.dict, dlict.dict)

        def clear(self):
            self.dict.clear()
            for i in range(len(self.names)):
                if self.empty[i] is None:
                    self.empty[i] = 1
                    self.list[i] = None

        def update(self):
            pass

        def load(self, index):
            """dlict-special method to support indexed access"""
            if self.empty[index] is None:
                return self.list[index]
            else:
                raise KeyError, index # XXX might want reverse mapping

        def store(self, index, val):
            """dlict-special method to support indexed access"""
            self.empty[index] = None
            self.list[index] = val

        def delete(self, index):
            """dlict-special method to support indexed access"""
            self.empty[index] = 1
            self.list[index] = None

Compiler issues

The compiler currently collects the names of all global variables in a
module. These are names bound at the module level or bound in a class or
function body that declares them to be global.

The compiler would assign indices for each global name and add the names
and indices of the globals to the module's code object. Each code object
would then be bound irrevocably to the module it was defined in. (Not
sure if there are some subtle problems with this.)

For attributes of imported modules, the module will store an indirection
record. Internally, the module will store a pointer to the defining
module and the offset of the attribute in the defining module's global
variable array. The offset would be initialized the first time the name
is looked up.

Runtime model

The PythonVM will be extended with new opcodes to access globals and
module attributes via a module-level array.

A function object would need to point to the module that defined it in
order to provide access to the module-level global array.

For module attributes stored in the dlict (call them static attributes),
the get/delattr implementation would need to track access to these
attributes using the old by-name interface. If a static attribute is
updated dynamically, e.g.:

    mod.__dict__["foo"] = 2

The implementation would need to update the array slot instead of the
backup dict.

Backwards compatibility

The dlict will need to maintain meta-information about whether a slot is
currently used or not. It will also need to maintain a pointer to the
builtin namespace. When a name is not currently used in the global
namespace, the lookup will have to fail over to the builtin namespace.

In the reverse case, each module may need a special accessor function
for the builtin namespace that checks to see if a global shadowing the
builtin has been added dynamically. This check would only occur if there
was a dynamic change to the module's dlict, i.e. when a name is bound
that wasn't discovered at compile-time.

These mechanisms would have little if any cost for the common case
whether a module's global namespace is not modified in strange ways at
runtime. They would add overhead for modules that did unusual things
with global names, but this is an uncommon practice and probably one
worth discouraging.

It may be desirable to disable dynamic additions to the global namespace
in some future version of Python. If so, the new implementation could
provide warnings.

Related PEPs

PEP 266, Optimizing Global Variable/Attribute Access, proposes a
different mechanism for optimizing access to global variables as well as
attributes of objects. The mechanism uses two new opcodes TRACK_OBJECT
and UNTRACK_OBJECT to create a slot in the local variables array that
aliases the global or object attribute. If the object being aliases is
rebound, the rebind operation is responsible for updating the aliases.

The objecting tracking approach applies to a wider range of objects than
just module. It may also have a higher runtime cost, because each
function that uses a global or object attribute must execute extra
opcodes to register its interest in an object and unregister on exit;
the cost of registration is unclear, but presumably involves a
dynamically resizable data structure to hold a list of callbacks.

The implementation proposed here avoids the need for registration,
because it does not create aliases. Instead it allows functions that
reference a global variable or module attribute to retain a pointer to
the location where the original binding is stored. A second advantage is
that the initial lookup is performed once per module rather than once
per function call.

Copyright

This document has been placed in the public domain.