PEP: 539 Title: A New C-API for Thread-Local Storage in CPython Version:
$Revision$ Last-Modified: $Date$ Author: Erik M. Bray, Masayuki Yamamoto
BDFL-Delegate: Alyssa Coghlan Status: Final Type: Standards Track
Content-Type: text/x-rst Created: 20-Dec-2016 Python-Version: 3.7
Post-History: 16-Dec-2016, 31-Aug-2017, 08-Sep-2017 Resolution:
https://mail.python.org/pipermail/python-dev/2017-September/149358.html

Abstract

The proposal is to add a new Thread Local Storage (TLS) API to CPython
which would supersede use of the existing TLS API within the CPython
interpreter, while deprecating the existing API. The new API is named
the "Thread Specific Storage (TSS) API" (see Rationale for Proposed
Solution for the origin of the name).

Because the existing TLS API is only used internally (it is not
mentioned in the documentation, and the header that defines it,
pythread.h, is not included in Python.h either directly or indirectly),
this proposal probably only affects CPython, but might also affect other
interpreter implementations (PyPy?) that implement parts of the CPython
API.

This is motivated primarily by the fact that the old API uses int to
represent TLS keys across all platforms, which is neither
POSIX-compliant, nor portable in any practical sense[1].

Note

Throughout this document the acronym "TLS" refers to Thread Local
Storage and should not be confused with "Transportation Layer Security"
protocols.

Specification

The current API for TLS used inside the CPython interpreter consists of
6 functions:

    PyAPI_FUNC(int) PyThread_create_key(void)
    PyAPI_FUNC(void) PyThread_delete_key(int key)
    PyAPI_FUNC(int) PyThread_set_key_value(int key, void *value)
    PyAPI_FUNC(void *) PyThread_get_key_value(int key)
    PyAPI_FUNC(void) PyThread_delete_key_value(int key)
    PyAPI_FUNC(void) PyThread_ReInitTLS(void)

These would be superseded by a new set of analogous functions:

    PyAPI_FUNC(int) PyThread_tss_create(Py_tss_t *key)
    PyAPI_FUNC(void) PyThread_tss_delete(Py_tss_t *key)
    PyAPI_FUNC(int) PyThread_tss_set(Py_tss_t *key, void *value)
    PyAPI_FUNC(void *) PyThread_tss_get(Py_tss_t *key)

The specification also adds a few new features:

-   A new type Py_tss_t--an opaque type the definition of which may
    depend on the underlying TLS implementation. It is defined:

        typedef struct {
            int _is_initialized;
            NATIVE_TSS_KEY_T _key;
        } Py_tss_t;

    where NATIVE_TSS_KEY_T is a macro whose value depends on the
    underlying native TLS implementation (e.g. pthread_key_t).

-   An initializer for Py_tss_t variables, Py_tss_NEEDS_INIT.

-   Three new functions:

        PyAPI_FUNC(Py_tss_t *) PyThread_tss_alloc(void)
        PyAPI_FUNC(void) PyThread_tss_free(Py_tss_t *key)
        PyAPI_FUNC(int) PyThread_tss_is_created(Py_tss_t *key)

    The first two are needed for dynamic (de-)allocation of a Py_tss_t,
    particularly in extension modules built with Py_LIMITED_API, where
    static allocation of this type is not possible due to its
    implementation being opaque at build time. A value returned by
    PyThread_tss_alloc is in the same state as a value initialized with
    Py_tss_NEEDS_INIT, or NULL in the case of dynamic allocation
    failure. The behavior of PyThread_tss_free involves calling
    PyThread_tss_delete preventively, or is a no-op if the value pointed
    to by the key argument is NULL. PyThread_tss_is_created returns
    non-zero if the given Py_tss_t has been initialized (i.e. by
    PyThread_tss_create).

The new TSS API does not provide functions which correspond to
PyThread_delete_key_value and PyThread_ReInitTLS, because these
functions were needed only for CPython's now defunct built-in TLS
implementation; that is the existing behavior of these functions is
treated as follows: PyThread_delete_key_value(key) is equivalent to
PyThread_set_key_value(key, NULL), and PyThread_ReInitTLS() is a
no-op[2].

The new PyThread_tss_ functions are almost exactly analogous to their
original counterparts with a few minor differences: Whereas
PyThread_create_key takes no arguments and returns a TLS key as an int,
PyThread_tss_create takes a Py_tss_t* as an argument and returns an int
status code. The behavior of PyThread_tss_create is undefined if the
value pointed to by the key argument is not initialized by
Py_tss_NEEDS_INIT. The returned status code is zero on success and
non-zero on failure. The meanings of non-zero status codes are not
otherwise defined by this specification.

Similarly the other PyThread_tss_ functions are passed a Py_tss_t*
whereas previously the key was passed by value. This change is
necessary, as being an opaque type, the Py_tss_t type could
hypothetically be almost any size. This is especially necessary for
extension modules built with Py_LIMITED_API, where the size of the type
is not known. Except for PyThread_tss_free, the behaviors of
PyThread_tss_ are undefined if the value pointed to by the key argument
is NULL.

Moreover, because of the use of Py_tss_t instead of int, there are
behaviors in the new API which differ from the existing API with regard
to key creation and deletion. PyThread_tss_create can be called
repeatedly on the same key--calling it on an already initialized key is
a no-op and immediately returns success. Similarly for calling
PyThread_tss_delete with an uninitialized key.

The behavior of PyThread_tss_delete is defined to change the key's
initialization state to "uninitialized"--this allows, for example,
statically allocated keys to be reset to a sensible state when
restarting the CPython interpreter without terminating the process (e.g.
embedding Python in an application)[3].

The old PyThread_*_key* functions will be marked as deprecated in the
documentation, but will not generate runtime deprecation warnings.

Additionally, on platforms where sizeof(pthread_key_t) != sizeof(int),
PyThread_create_key will return immediately with a failure status, and
the other TLS functions will all be no-ops on such platforms.

Comparison of API Specification

+-------------------+-----------------------+-----------------------+
| API               | Thread Local Storage  | Thread Specific       |
|                   | (TLS)                 | Storage (TSS)         |
+===================+=======================+=======================+
| Version           | Existing              | New                   |
+-------------------+-----------------------+-----------------------+
| Key Type          | int                   | Py_tss_t (opaque      |
|                   |                       | type)                 |
+-------------------+-----------------------+-----------------------+
| Handle Native Key | cast to int           | conceal into internal |
|                   |                       | field                 |
+-------------------+-----------------------+-----------------------+
| Function Argument | int                   | Py_tss_t *            |
+-------------------+-----------------------+-----------------------+
| Features          | -   create key        | -   create key        |
|                   | -   delete key        | -   delete key        |
|                   | -   set value         | -   set value         |
|                   | -   get value         | -   get value         |
|                   | -   delete value      | -   (set NULL         |
|                   | -   reinitialize keys |     instead)[7]       |
|                   |     (after fork)      | -   (unnecessary)[8]  |
|                   |                       | -   dynamically       |
|                   |                       |     (de-)allocate key |
|                   |                       | -   check key's       |
|                   |                       |     initialization    |
|                   |                       |     state             |
+-------------------+-----------------------+-----------------------+
| Key Initializer   | (-1 as key creation   | Py_tss_NEEDS_INIT     |
|                   | failure)              |                       |
+-------------------+-----------------------+-----------------------+
| Requirement       | native threads (since | native threads        |
|                   | CPython 3.7[9])       |                       |
+-------------------+-----------------------+-----------------------+
| Restriction       | No support for        | Unable to statically  |
|                   | platforms where       | allocate keys when    |
|                   | native TLS key is     | Py_LIMITED_API is     |
|                   | defined in a way that | defined.              |
|                   | cannot be safely cast |                       |
|                   | to int.               |                       |
+-------------------+-----------------------+-----------------------+

Example

With the proposed changes, a TSS key is initialized like:

    static Py_tss_t tss_key = Py_tss_NEEDS_INIT;
    if (PyThread_tss_create(&tss_key)) {
        /* ... handle key creation failure ... */
    }

The initialization state of the key can then be checked like:

    assert(PyThread_tss_is_created(&tss_key));

The rest of the API is used analogously to the old API:

    int the_value = 1;
    if (PyThread_tss_get(&tss_key) == NULL) {
        PyThread_tss_set(&tss_key, (void *)&the_value);
        assert(PyThread_tss_get(&tss_key) != NULL);
    }
    /* ... once done with the key ... */
    PyThread_tss_delete(&tss_key);
    assert(!PyThread_tss_is_created(&tss_key));

When Py_LIMITED_API is defined, a TSS key must be dynamically allocated:

    static Py_tss_t *ptr_key = PyThread_tss_alloc();
    if (ptr_key == NULL) {
        /* ... handle key allocation failure ... */
    }
    assert(!PyThread_tss_is_created(ptr_key));
    /* ... once done with the key ... */
    PyThread_tss_free(ptr_key);
    ptr_key = NULL;

Platform Support Changes

A new "Native Thread Implementation" section will be added to PEP 11
that states:

-   As of CPython 3.7, all platforms are required to provide a native
    thread implementation (such as pthreads or Windows) to implement the
    TSS API. Any TSS API problems that occur in an implementation
    without native threads will be closed as "won't fix".

Motivation

The primary problem at issue here is the type of the keys (int) used for
TLS values, as defined by the original PyThread TLS API.

The original TLS API was added to Python by GvR back in 1997, and at the
time the key used to represent a TLS value was an int, and so it has
been to the time of writing. This used CPython's own TLS implementation
which long remained unused, largely unchanged, in Python/thread.c.
Support for implementation of the API on top of native thread
implementations (pthreads and Windows) was added much later, and the
built-in implementation has been deemed no longer necessary and has
since been removed[10].

The problem with the choice of int to represent a TLS key, is that while
it was fine for CPython's own TLS implementation, and happens to be
compatible with Windows (which uses DWORD for the analogous data), it is
not compatible with the POSIX standard for the pthreads API, which
defines pthread_key_t as an opaque type not further defined by the
standard (as with Py_tss_t described above)[11]. This leaves it up to
the underlying implementation how a pthread_key_t value is used to look
up thread-specific data.

This has not generally been a problem for Python's API, as it just
happens that on Linux pthread_key_t is defined as an unsigned int, and
so is fully compatible with Python's TLS API--pthread_key_t's created by
pthread_create_key can be freely cast to int and back (well, not
exactly, even this has some limitations as pointed out by issue #22206).

However, as issue #25658 points out, there are at least some platforms
(namely Cygwin, CloudABI, but likely others as well) which have
otherwise modern and POSIX-compliant pthreads implementations, but are
not compatible with Python's API because their pthread_key_t is defined
in a way that cannot be safely cast to int. In fact, the possibility of
running into this problem was raised by MvL at the time pthreads TLS was
added[12].

It could be argued that PEP 11 makes specific requirements for
supporting a new, not otherwise officially-support platform (such as
CloudABI), and that the status of Cygwin support is currently dubious.
However, this creates a very high barrier to supporting platforms that
are otherwise Linux- and/or POSIX-compatible and where CPython might
otherwise "just work" except for this one hurdle. CPython itself imposes
this implementation barrier by way of an API that is not compatible with
POSIX (and in fact makes invalid assumptions about pthreads).

Rationale for Proposed Solution

The use of an opaque type (Py_tss_t) to key TLS values allows the API to
be compatible with all present (POSIX and Windows) and future (C11?)
native TLS implementations supported by CPython, as it allows the
definition of Py_tss_t to depend on the underlying implementation.

Since the existing TLS API has been available in the limited API[13] for
some platforms (e.g. Linux), CPython makes an effort to provide the new
TSS API at that level likewise. Note, however, that the Py_tss_t
definition becomes to be an opaque struct when Py_LIMITED_API is
defined, because exposing NATIVE_TSS_KEY_T as part of the limited API
would prevent us from switching native thread implementation without
rebuilding extension modules.

A new API must be introduced, rather than changing the function
signatures of the current API, in order to maintain backwards
compatibility. The new API also more clearly groups together these
related functions under a single name prefix, PyThread_tss_. The "tss"
in the name stands for "thread-specific storage", and was influenced by
the naming and design of the "tss" API that is part of the C11 threads
API[14]. However, this is in no way meant to imply compatibility with or
support for the C11 threads API, or signal any future intention of
supporting C11--it's just the influence for the naming and design.

The inclusion of the special initializer Py_tss_NEEDS_INIT is required
by the fact that not all native TLS implementations define a sentinel
value for uninitialized TLS keys. For example, on Windows a TLS key is
represented by a DWORD (unsigned int) and its value must be treated as
opaque[15]. So there is no unsigned integer value that can be safely
used to represent an uninitialized TLS key on Windows. Likewise, POSIX
does not specify a sentinel for an uninitialized pthread_key_t, instead
relying on the pthread_once interface to ensure that a given TLS key is
initialized only once per-process. Therefore, the Py_tss_t type contains
an explicit ._is_initialized that can indicate the key's initialization
state independent of the underlying implementation.

Changing PyThread_create_key to immediately return a failure status on
systems using pthreads where sizeof(int) != sizeof(pthread_key_t) is
intended as a sanity check: Currently, PyThread_create_key may report
initial success on such systems, but attempts to use the returned key
are likely to fail. Although in practice this failure occurs earlier in
the interpreter initialization, it's better to fail immediately at the
source of problem (PyThread_create_key) rather than sometime later when
use of an invalid key is attempted. In other words, this indicates
clearly that the old API is not supported on platforms where it cannot
be used reliably, and that no effort will be made to add such support.

Rejected Ideas

-   Do nothing: The status quo is fine because it works on Linux, and
    platforms wishing to be supported by CPython should follow the
    requirements of PEP 11. As explained above, while this would be a
    fair argument if CPython were being to asked to make changes to
    support particular quirks or features of a specific platform, in
    this case it is a quirk of CPython that prevents it from being used
    to its full potential on otherwise POSIX-compliant platforms. The
    fact that the current implementation happens to work on Linux is a
    happy accident, and there's no guarantee that this will never
    change.
-   Affected platforms should just configure Python --without-threads:
    this is no longer an option as the --without-threads option has been
    removed for Python 3.7[16].
-   Affected platforms should use CPython's built-in TLS implementation
    instead of a native TLS implementation: This is a more acceptable
    alternative to the previous idea, and in fact there had been a patch
    to do just that[17]. However, the built-in implementation being
    "slower and clunkier" in general than native implementations still
    needlessly hobbles performance on affected platforms. At least one
    other module (tracemalloc) is also broken if Python is built without
    a native TLS implementation. This idea also cannot be adopted
    because the built-in implementation has since been removed.
-   Keep the existing API, but work around the issue by providing a
    mapping from pthread_key_t values to int values. A couple attempts
    were made at this ([18],[19]), but this injects needless complexity
    and overhead into performance-critical code on platforms that are
    not currently affected by this issue (such as Linux). Even if use of
    this workaround were made conditional on platform compatibility, it
    introduces platform-specific code to maintain, and still has the
    problem of the previous rejected ideas of needlessly hobbling
    performance on affected platforms.

Implementation

An initial version of a patch[20] is available on the bug tracker for
this issue. Since the migration to GitHub, its development has continued
in the pep539-tss-api feature branch[21] in Masayuki Yamamoto's fork of
the CPython repository on GitHub. A work-in-progress PR is available
at[22].

This reference implementation covers not only the new API implementation
features, but also the client code updates needed to replace the
existing TLS API with the new TSS API.

Copyright

This document has been placed in the public domain.

References and Footnotes

[1] http://bugs.python.org/issue25658

[2] https://bugs.python.org/msg298342

[3] https://docs.python.org/3/c-api/init.html#c.Py_FinalizeEx

[4] https://bugs.python.org/msg298342

[5] https://bugs.python.org/msg298342

[6] http://bugs.python.org/issue30832

[7] https://bugs.python.org/msg298342

[8] https://bugs.python.org/msg298342

[9] http://bugs.python.org/issue30832

[10] http://bugs.python.org/issue30832

[11] http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html

[12] https://bugs.python.org/msg116292

[13] It is also called as "stable ABI" (PEP 384)

[14] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf#page=404

[15] https://msdn.microsoft.com/en-us/library/windows/desktop/ms686801(v=vs.85).aspx

[16] https://bugs.python.org/issue31370

[17] http://bugs.python.org/file45548/configure-pthread_key_t.patch

[18] http://bugs.python.org/file44269/issue25658-1.patch

[19] http://bugs.python.org/file44303/key-constant-time.diff

[20] http://bugs.python.org/file46379/pythread-tss-3.patch

[21] https://github.com/python/cpython/compare/master...ma8ma:pep539-tss-api

[22] https://github.com/python/cpython/pull/1362