PEP: 509 Title: Add a private version to dict Author: Victor Stinner
<vstinner@python.org> Status: Superseded Type: Standards Track
Content-Type: text/x-rst Created: 04-Jan-2016 Python-Version: 3.6
Post-History: 08-Jan-2016, 11-Jan-2016, 14-Apr-2016, 19-Apr-2016
Superseded-By: 699 Resolution:
https://mail.python.org/archives/list/python-dev@python.org/message/QFVJV6YQOUSWIYY4FBORY647YCBSCIMQ/

Abstract

Add a new private version to the builtin dict type, incremented at each
dictionary creation and at each dictionary change, to implement fast
guards on namespaces.

Rationale

In Python, the builtin dict type is used by many instructions. For
example, the LOAD_GLOBAL instruction looks up a variable in the global
namespace, or in the builtins namespace (two dict lookups). Python uses
dict for the builtins namespace, globals namespace, type namespaces,
instance namespaces, etc. The local namespace (function namespace) is
usually optimized to an array, but it can be a dict too.

Python is hard to optimize because almost everything is mutable: builtin
functions, function code, global variables, local variables, ... can be
modified at runtime. Implementing optimizations respecting the Python
semantics requires to detect when "something changes": we will call
these checks "guards".

The speedup of optimizations depends on the speed of guard checks. This
PEP proposes to add a private version to dictionaries to implement fast
guards on namespaces.

Dictionary lookups can be skipped if the version does not change, which
is the common case for most namespaces. The version is globally unique,
so checking the version is also enough to verify that the namespace
dictionary was not replaced with a new dictionary.

When the dictionary version does not change, the performance of a guard
does not depend on the number of watched dictionary entries: the
complexity is O(1).

Example of optimization: copy the value of a global variable to function
constants. This optimization requires a guard on the global variable to
check if it was modified after it was copied. If the global variable is
not modified, the function uses the cached copy. If the global variable
is modified, the function uses a regular lookup, and maybe also
deoptimizes the function (to remove the overhead of the guard check for
next function calls).

See the 510 -- Specialized functions with guards <510> for concrete
usage of guards to specialize functions and for a more general rationale
on Python static optimizers.

Guard example

Pseudo-code of a fast guard to check if a dictionary entry was modified
(created, updated or deleted) using a hypothetical
dict_get_version(dict) function:

    UNSET = object()

    class GuardDictKey:
        def __init__(self, dict, key):
            self.dict = dict
            self.key = key
            self.value = dict.get(key, UNSET)
            self.version = dict_get_version(dict)

        def check(self):
            """Return True if the dictionary entry did not change
            and the dictionary was not replaced."""

            # read the version of the dictionary
            version = dict_get_version(self.dict)
            if version == self.version:
                # Fast-path: dictionary lookup avoided
                return True

            # lookup in the dictionary
            value = self.dict.get(self.key, UNSET)
            if value is self.value:
                # another key was modified:
                # cache the new dictionary version
                self.version = version
                return True

            # the key was modified
            return False

Usage of the dict version

Speedup method calls

Yury Selivanov wrote a patch to optimize method calls. The patch depends
on the "implement per-opcode cache in ceval" patch which requires
dictionary versions to invalidate the cache if the globals dictionary or
the builtins dictionary has been modified.

The cache also requires that the dictionary version is globally unique.
It is possible to define a function in a namespace and call it in a
different namespace, using exec() with the globals parameter for
example. In this case, the globals dictionary was replaced and the cache
must also be invalidated.

Specialized functions using guards

PEP 510 proposes an API to support specialized functions with guards. It
allows to implement static optimizers for Python without breaking the
Python semantics.

The fatoptimizer of the FAT Python project is an example of a static
Python optimizer. It implements many optimizations which require guards
on namespaces:

-   Call pure builtins: to replace len("abc") with 3, guards on
    builtins.__dict__['len'] and globals()['len'] are required
-   Loop unrolling: to unroll the loop for i in range(...): ..., guards
    on builtins.__dict__['range'] and globals()['range'] are required
-   etc.

Pyjion

According of Brett Cannon, one of the two main developers of Pyjion,
Pyjion can benefit from dictionary version to implement optimizations.

Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET
Core runtime).

Cython

Cython can benefit from dictionary version to implement optimizations.

Cython is an optimising static compiler for both the Python programming
language and the extended Cython programming language.

Unladen Swallow

Even if dictionary version was not explicitly mentioned, optimizing
globals and builtins lookup was part of the Unladen Swallow plan:
"Implement one of the several proposed schemes for speeding lookups of
globals and builtins." (source: Unladen Swallow ProjectPlan).

Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler
implemented with LLVM. The project stopped in 2011: Unladen Swallow
Retrospective.

Changes

Add a ma_version_tag field to the PyDictObject structure with the C type
PY_UINT64_T, 64-bit unsigned integer. Add also a global dictionary
version.

Each time a dictionary is created, the global version is incremented and
the dictionary version is initialized to the global version.

Each time the dictionary content is modified, the global version must be
incremented and copied to the dictionary version. Dictionary methods
which can modify its content:

-   clear()
-   pop(key)
-   popitem()
-   setdefault(key, value)
-   __delitem__(key)
-   __setitem__(key, value)
-   update(...)

The choice of increasing or not the version when a dictionary method
does not change its content is left to the Python implementation. A
Python implementation can decide to not increase the version to avoid
dictionary lookups in guards. Examples of cases when dictionary methods
don't modify its content:

-   clear() if the dict is already empty
-   pop(key) if the key does not exist
-   popitem() if the dict is empty
-   setdefault(key, value) if the key already exists
-   __delitem__(key) if the key does not exist
-   __setitem__(key, value) if the new value is identical to the current
    value
-   update() if called without argument or if new values are identical
    to current values

Setting a key to a new value equals to the old value is also considered
as an operation modifying the dictionary content.

Two different empty dictionaries must have a different version to be
able to identify a dictionary just by its version. It allows to verify
in a guard that a namespace was not replaced without storing a strong
reference to the dictionary. Using a borrowed reference does not work:
if the old dictionary is destroyed, it is possible that a new dictionary
is allocated at the same memory address. By the way, dictionaries don't
support weak references.

The version increase must be atomic. In CPython, the Global Interpreter
Lock (GIL) already protects dict methods to make changes atomic.

Example using a hypothetical dict_get_version(dict) function:

    >>> d = {}
    >>> dict_get_version(d)
    100
    >>> d['key'] = 'value'
    >>> dict_get_version(d)
    101
    >>> d['key'] = 'new value'
    >>> dict_get_version(d)
    102
    >>> del d['key']
    >>> dict_get_version(d)
    103

The field is called ma_version_tag, rather than ma_version, to suggest
to compare it using version_tag == old_version_tag, rather than
version <= old_version which becomes wrong after an integer overflow.

Backwards Compatibility

Since the PyDictObject structure is not part of the stable ABI and the
new dictionary version not exposed at the Python scope, changes are
backward compatible.

Implementation and Performance

The issue #26058: PEP 509: Add ma_version_tag to PyDictObject contains a
patch implementing this PEP.

On pybench and timeit microbenchmarks, the patch does not seem to add
any overhead on dictionary operations. For example, the following timeit
micro-benchmarks takes 318 nanoseconds before and after the change:

    python3.6 -m timeit 'd={1: 0}; d[2]=0; d[3]=0; d[4]=0; del d[1]; del d[2]; d.clear()'

When the version does not change, PyDict_GetItem() takes 14.8 ns for a
dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover, a
guard can watch for multiple keys. For example, for an optimization
using 10 global variables in a function, 10 dictionary lookups costs 148
ns, whereas the guard still only costs 3.8 ns when the version does not
change (39x as fast).

The fat module implements such guards: fat.GuardDict is based on the
dictionary version.

Integer overflow

The implementation uses the C type PY_UINT64_T to store the version: a
64 bits unsigned integer. The C code uses version++. On integer
overflow, the version is wrapped to 0 (and then continues to be
incremented) according to the C standard.

After an integer overflow, a guard can succeed whereas the watched
dictionary key was modified. The bug only occurs at a guard check if
there are exactly 2 ** 64 dictionary creations or modifications since
the previous guard check.

If a dictionary is modified every nanosecond, 2 ** 64 modifications
takes longer than 584 years. Using a 32-bit version, it only takes 4
seconds. That's why a 64-bit unsigned type is also used on 32-bit
systems. A dictionary lookup at the C level takes 14.8 ns.

A risk of a bug every 584 years is acceptable.

Alternatives

Expose the version at Python level as a read-only __version__ property

The first version of the PEP proposed to expose the dictionary version
as a read-only __version__ property at Python level, and also to add the
property to collections.UserDict (since this type must mimic the dict
API).

There are multiple issues:

-   To be consistent and avoid bad surprises, the version must be added
    to all mapping types. Implementing a new mapping type would require
    extra work for no benefit, since the version is only required on the
    dict type in practice.

-   All Python implementations would have to implement this new
    property, it gives more work to other implementations, whereas they
    may not use the dictionary version at all.

-   Exposing the dictionary version at the Python level can lead the
    false assumption on performances. Checking dict.__version__ at the
    Python level is not faster than a dictionary lookup. A dictionary
    lookup in Python has a cost of 48.7 ns and checking the version has
    a cost of 47.5 ns, the difference is only 1.2 ns (3%):

        $ python3.6 -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33'
        10000000 loops, best of 3: 0.0487 usec per loop
        $ python3.6 -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100'
        10000000 loops, best of 3: 0.0475 usec per loop

-   The __version__ can be wrapped on integer overflow. It is error
    prone: using dict.__version__ <= guard_version is wrong,
    dict.__version__ == guard_version must be used instead to reduce the
    risk of bug on integer overflow (even if the integer overflow is
    unlikely in practice).

Mandatory bikeshedding on the property name:

-   __cache_token__: name proposed by Alyssa Coghlan, name coming from
    abc.get_cache_token().
-   __version__
-   __version_tag__
-   __timestamp__

Add a version to each dict entry

A single version per dictionary requires to keep a strong reference to
the value which can keep the value alive longer than expected. If we add
also a version per dictionary entry, the guard can only store the entry
version (a simple integer) to avoid the strong reference to the value:
only strong references to the dictionary and to the key are needed.

Changes: add a me_version_tag field to the PyDictKeyEntry structure, the
field has the C type PY_UINT64_T. When a key is created or modified, the
entry version is set to the dictionary version which is incremented at
any change (create, modify, delete).

Pseudo-code of a fast guard to check if a dictionary key was modified
using hypothetical dict_get_version(dict) and
dict_get_entry_version(dict) functions:

    UNSET = object()

    class GuardDictKey:
        def __init__(self, dict, key):
            self.dict = dict
            self.key = key
            self.dict_version = dict_get_version(dict)
            self.entry_version = dict_get_entry_version(dict, key)

        def check(self):
            """Return True if the dictionary entry did not change
            and the dictionary was not replaced."""

            # read the version of the dictionary
            dict_version = dict_get_version(self.dict)
            if dict_version == self.version:
                # Fast-path: dictionary lookup avoided
                return True

            # lookup in the dictionary to read the entry version
            entry_version = get_dict_key_version(dict, key)
            if entry_version == self.entry_version:
                # another key was modified:
                # cache the new dictionary version
                self.dict_version = dict_version
                self.entry_version = entry_version
                return True

            # the key was modified
            return False

The main drawback of this option is the impact on the memory footprint.
It increases the size of each dictionary entry, so the overhead depends
on the number of buckets (dictionary entries, used or not used). For
example, it increases the size of each dictionary entry by 8 bytes on
64-bit system.

In Python, the memory footprint matters and the trend is to reduce it.
Examples:

-   PEP 393 -- Flexible String Representation
-   PEP 412 -- Key-Sharing Dictionary

Add a new dict subtype

Add a new verdict type, subtype of dict. When guards are needed, use the
verdict for namespaces (module namespace, type namespace, instance
namespace, etc.) instead of dict.

Leave the dict type unchanged to not add any overhead (CPU, memory
footprint) when guards are not used.

Technical issue: a lot of C code in the wild, including CPython core,
expecting the exact dict type. Issues:

-   exec() requires a dict for globals and locals. A lot of code use
    globals={}. It is not possible to cast the dict to a dict subtype
    because the caller expects the globals parameter to be modified
    (dict is mutable).
-   C functions call directly PyDict_xxx() functions, instead of calling
    PyObject_xxx() if the object is a dict subtype
-   PyDict_CheckExact() check fails on dict subtype, whereas some
    functions require the exact dict type.
-   Python/ceval.c does not completely supports dict subtypes for
    namespaces

The exec() issue is a blocker issue.

Other issues:

-   The garbage collector has a special code to "untrack" dict
    instances. If a dict subtype is used for namespaces, the garbage
    collector can be unable to break some reference cycles.
-   Some functions have a fast-path for dict which would not be taken
    for dict subtypes, and so it would make Python a little bit slower.

Prior Art

Method cache and type version tag

In 2007, Armin Rigo wrote a patch to implement a cache of methods. It
was merged into Python 2.6. The patch adds a "type attribute cache
version tag" (tp_version_tag) and a "valid version tag" flag to types
(the PyTypeObject structure).

The type version tag is not exposed at the Python level.

The version tag has the C type unsigned int. The cache is a global hash
table of 4096 entries, shared by all types. The cache is global to "make
it fast, have a deterministic and low memory footprint, and be easy to
invalidate". Each cache entry has a version tag. A global version tag is
used to create the next version tag, it also has the C type
unsigned int.

By default, a type has its "valid version tag" flag cleared to indicate
that the version tag is invalid. When the first method of the type is
cached, the version tag and the "valid version tag" flag are set. When a
type is modified, the "valid version tag" flag of the type and its
subclasses is cleared. Later, when a cache entry of these types is used,
the entry is removed because its version tag is outdated.

On integer overflow, the whole cache is cleared and the global version
tag is reset to 0.

See Method cache (issue #1685986) and Armin's method cache optimization
updated for Python 2.6 (issue #1700288).

Globals / builtins cache

In 2010, Antoine Pitrou proposed a Globals / builtins cache (issue
#10401) which adds a private ma_version field to the PyDictObject
structure (dict type), the field has the C type Py_ssize_t.

The patch adds a "global and builtin cache" to functions and frames, and
changes LOAD_GLOBAL and STORE_GLOBAL instructions to use the cache.

The change on the PyDictObject structure is very similar to this PEP.

Cached globals+builtins lookup

In 2006, Andrea Griffini proposed a patch implementing a Cached
globals+builtins lookup optimization. The patch adds a private timestamp
field to the PyDictObject structure (dict type), the field has the C
type size_t.

Thread on python-dev: About dictionary lookup caching (December 2006).

Guard against changing dict during iteration

In 2013, Serhiy Storchaka proposed Guard against changing dict during
iteration (issue #19332) which adds a ma_count field to the PyDictObject
structure (dict type), the field has the C type size_t. This field is
incremented when the dictionary is modified.

PySizer

PySizer: a memory profiler for Python, Google Summer of Code 2005
project by Nick Smallbone.

This project has a patch for CPython 2.4 which adds key_time and
value_time fields to dictionary entries. It uses a global process-wide
counter for dictionaries, incremented each time that a dictionary is
modified. The times are used to decide when child objects first appeared
in their parent objects.

Discussion

Thread on the mailing lists:

-   python-dev: Updated PEP 509
-   python-dev: RFC: PEP 509: Add a private version to dict
-   python-dev: PEP 509: Add a private version to dict (January 2016)
-   python-ideas: RFC: PEP: Add dict.__version__ (January 2016)

Acceptance

The PEP was accepted on 2016-09-07 by Guido van Rossum. The PEP
implementation has since been committed to the repository.

Copyright

This document has been placed in the public domain.