Following system colour scheme Selected dark colour scheme Selected light colour scheme

Python Enhancement Proposals

PEP 788 – Reimagining Native Threads

Author:
Peter Bierma <zintensitydev at gmail.com>
Sponsor:
Victor Stinner <vstinner at python.org>
Discussions-To:
Discourse thread
Status:
Draft
Type:
Standards Track
Created:
23-Apr-2025
Python-Version:
3.15
Post-History:
10-Mar-2025, 27-Apr-2025, 28-May-2025

Table of Contents

Abstract

In the C API, threads are able to interact with an interpreter by holding an attached thread state for the current thread. This works well, but can get complicated when it comes to creating and attaching thread states in a thread-safe manner.

Specifically, the C API doesn’t have any way to ensure that an interpreter is in a state where it can be called when creating and/or attaching a thread state. As such, attachment might hang the thread, or it might flat-out crash due to the interpreter’s structure being deallocated in subinterpreters. This can be a frustrating issue to deal with in large applications that want to execute Python code alongside some other native code.

In addition, assumptions about which interpreter to use tend to be wrong inside of subinterpreters, primarily because PyGILState_Ensure() always creates a thread state for the main interpreter in threads where Python hasn’t ever run.

This PEP intends to solve these kinds issues by reimagining how we approach thread states in the C API. This is done through the introduction of interpreter references that prevent an interpreter from finalizing (or more technically, entering a stage in which attachment of a thread state hangs). This allows for more structure and reliability when it comes to thread state management, because it forces a layer of synchronization between the interpreter and the caller.

With this new system, there are a lot of changes needed in CPython and third-party libraries to adopt it. For example, in APIs that don’t require the caller to hold an attached thread state, a strong interpreter reference should be passed to ensure that it targets the correct interpreter, and that the interpreter doesn’t concurrently deallocate itself. The best example of this in CPython is PyGILState_Ensure(). As part of this proposal, PyThreadState_Ensure() is provided as a modern replacement that takes a strong interpreter reference.

Terminology

Interpreters

In this proposal, “interpreter” refers to a singular, isolated interpreter (see PEP 684), with its own PyInterpreterState pointer (referred to as an “interpreter-state”). “Interpreter” does not refer to the entirety of a Python process.

The “current interpreter” refers to the interpreter-state pointer on an attached thread state, as returned by PyThreadState_GetInterpreter() or PyInterpreterState_Get().

Native and Python Threads

This PEP refers to a thread created using the C API as a “native thread”, also sometimes referred to as a “non-Python created thread”, where a “Python created” is a thread created by the threading module.

A native thread is typically registered with the interpreter by PyGILState_Ensure(), but any thread with an attached thread state qualifies as a native thread.

Motivation

Native Threads Always Hang During Finalization

Many large libraries might need to call Python code in highly-asynchronous situations where the desired interpreter (typically the main interpreter) could be finalizing or deleted, but want to continue running code after invoking the interpreter. This desire has been brought up by users. For example, a callback that wants to call Python code might be invoked when:

  • A kernel has finished running on a GPU.
  • A network packet was received.
  • A thread has quit, and a native library is executing static finalizers of thread local storage.

Generally, this pattern would look something like this:

static void
some_callback(void *closure)
{
    /* Do some work */
    /* ... */

    PyGILState_STATE gstate = PyGILState_Ensure();
    /* Invoke the C API to do some computation */
    PyGILState_Release(gstate);

    /* ... */
}

In the current C API, any “native” thread (one not created via the threading module) is considered to be “daemon”, meaning that the interpreter won’t wait on that thread before shutting down. Instead, the interpreter will hang the thread when it goes to attach a thread state, making the thread unusable past that point. Attaching a thread state can happen at any point when invoking Python, such as in-between bytecode instructions (to yield the GIL to a different thread), or when a C function exits a Py_BEGIN_ALLOW_THREADS block, so simply guarding against whether the interpreter is finalizing isn’t enough to safely call Python code. (Note that hanging the thread is relatively new behavior; in prior versions, the thread would exit, but the issue is the same.)

This means that any non-Python/native thread may be terminated at any point, which is severely limiting for users who want to do more than just execute Python code in their stream of calls.

Py_IsFinalizing is Insufficient

The docs currently recommend Py_IsFinalizing() to guard against termination of the thread:

Calling this function from a thread when the runtime is finalizing will terminate the thread, even if the thread was not created by Python. You can use Py_IsFinalizing() or sys.is_finalizing() to check if the interpreter is in process of being finalized before calling this function to avoid unwanted termination.

Unfortunately, this isn’t correct, because of time-of-call to time-of-use issues; the interpreter might not be finalizing during the call to Py_IsFinalizing(), but it might start finalizing immediately afterwards, which would cause the attachment of a thread state to hang the thread.

Daemon Threads Can Break Finalization

When acquiring locks, it’s extremely important to detach the thread state to prevent deadlocks. This is true on both the with-GIL and free-threaded builds.

When the GIL is enabled, a deadlock can occur pretty easily when acquiring a lock if the GIL wasn’t released; thread A grabs a lock, and starts waiting on its thread state to attach, while thread B holds the GIL and is waiting on the lock. A similar deadlock can occur on the free-threaded build during stop-the-world pauses when running the garbage collector.

This affects CPython itself, and there’s not much that can be done to fix it with the current API. For example, python/cpython#129536 remarks that the ssl module can emit a fatal error when used at finalization, because a daemon thread got hung while holding the lock for sys.stderr, and then a finalizer tried to write to it. Ideally, a thread should be able to temporarily prevent the interpreter from hanging it while it holds the lock.

However, it’s generally unsafe to acquire Python locks (for example, threading.Lock) in finalizers, because the garbage collector might run while the lock is held, which would deadlock if another finalizer tried to acquire the lock. This does not apply to many C locks, such as with sys.stderr, because Python code cannot be run while the lock is held. This PEP intends to fix this problem for C locks, not Python locks.

Daemon Threads are not the Problem

Prior to this PEP, deprecating daemon threads was discussed extensively. Daemon threads technically cause many of the issues outlined in this proposal, so removing daemon threads could be seen as a potential solution. The main argument for removing daemon threads is that they’re a large cause of problems in the interpreter [1].

Except that daemon threads don’t actually work reliably. They’re attempting to run and use Python interpreter resources after the runtime has been shut down upon runtime finalization. As in they have pointers to global state for the interpreter.

However, in practice, daemon threads are useful for simplifying many threading applications in Python, and since the program is about to close in most cases, it’s not worth the added complexity to try and gracefully shut down a thread [2].

When I’ve needed daemon threads, it’s usually been the case of “Long-running, uninterruptible, third-party task” in terms of the examples in the linked issue. Basically I’ve had something that I need running in the background, but I have no easy way to terminate it short of process termination. Unfortunately, I’m on Windows, so signal.pthread_kill isn’t an option. I guess I could use the Windows Terminate Thread API, but it’s a lot of work to wrap it myself compared to just letting process termination handle things.

Finally, removing Python-level daemon threads does not fix the whole problem. As noted by this PEP, extension modules are free to create their own threads and attach thread states for them. Similar to daemon threads, Python doesn’t try and join them during finalization, so trying to remove daemon threads as a whole would involve trying to remove them from the C API, which would require a much more massive API change than what is currently being proposed [3].

Realize however that even if we get rid of daemon threads, extension module code can and does spawn its own threads that are not tracked by Python. … Those are realistically an alternate form of daemon thread … and those are never going to be forbidden.

Joining the Thread isn’t Always a Good Idea

Even in daemon threads, it’s generally possible to prevent hanging of native threads through atexit functions. A thread could be started by some C function, and then as long as that thread is joined by atexit, then the thread won’t hang.

atexit isn’t always an option for a function, because to call it, it needs to already have an attached thread state for the thread. If there’s no guarantee of that, then atexit.register() cannot be safely called without the risk of hanging the thread. This shifts the contract of joining the thread to the caller rather than the callee, which again, isn’t reliable enough in practice to be a viable solution.

For example, large C++ applications might want to expose an interface that can call Python code. To do this, a C++ API would take a Python object, and then call PyGILState_Ensure() to safely interact with it (for example, by calling it). If the interpreter is finalizing or has shut down, then the thread is hung, disrupting the C++ stream of calls.

Finalization Behavior for PyGILState_Ensure Cannot Change

There will always have to be a point in a Python program where PyGILState_Ensure() can no longer attach a thread state. If the interpreter is long dead, then Python obviously can’t give a thread a way to invoke it. PyGILState_Ensure() doesn’t have any meaningful way to return a failure, so it has no choice but to terminate the thread or emit a fatal error, as noted in python/cpython#124622:

I think a new GIL acquisition and release C API would be needed. The way the existing ones get used in existing C code is not amenible to suddenly bolting an error state onto; none of the existing C code is written that way. After the call they always just assume they have the GIL and can proceed. The API was designed as “it’ll block and only return once it has the GIL” without any other option.

For this reason, we can’t make any real changes to how PyGILState_Ensure() works during finalization, because it would break existing code.

The GIL-state APIs are Buggy and Confusing

There are currently two public ways for a user to create and attach a thread state for their thread; manual use of PyThreadState_New() and PyThreadState_Swap(), or the convenient PyGILState_Ensure().

The latter, PyGILState_Ensure(), is significantly more common, having nearly 3,000 hits in a code search, whereas PyThreadState_New() has less than 400 hits.

PyGILState_Ensure Generally Crashes During Finalization

At the time of writing, the current behavior of PyGILState_Ensure() does not always match the documentation. Instead of hanging the thread during finalization as previously noted, it’s possible for it to crash with a segmentation fault. This is a known issue that could be fixed in CPython, but it’s definitely worth noting here, because acceptance and implementation of this PEP will likely fix the existing crashes caused by PyGILState_Ensure().

The Term “GIL” is Tricky for Free-threading

A large issue with the term “GIL” in the C API is that it is semantically misleading. This was noted in python/cpython#127989, created by the authors of this PEP:

The biggest issue is that for free-threading, there is no GIL, so users erroneously call the C API inside Py_BEGIN_ALLOW_THREADS blocks or omit PyGILState_Ensure in fresh threads.

Again, PyGILState_Ensure() gets an attached thread state for the thread on both with-GIL and free-threaded builds. An attached thread state is always needed to call the C API, so PyGILState_Ensure() still needs to be called on free-threaded builds, but with a name like “ensure GIL”, it’s not immediately clear that that’s true.

PyGILState_Ensure Doesn’t Guess the Correct Interpreter

As noted in the documentation, the PyGILState functions aren’t officially supported in subinterpreters:

Note that the PyGILState_* functions assume there is only one global interpreter (created automatically by Py_Initialize()). Python supports the creation of additional interpreters (using Py_NewInterpreter()), but mixing multiple interpreters and the PyGILState_* API is unsupported.

This is because PyGILState_Ensure() doesn’t have any way to know which interpreter created the thread, and as such, it has to assume that it was the main interpreter. There isn’t any way to detect this at runtime, so spurious races are bound to come up in threads created by subinterpreters, because synchronization for the wrong interpreter will be used on objects shared between the threads.

For example, if the thread had access to object A, which belongs to a subinterpreter, but then called PyGILState_Ensure(), the thread would have an attached thread state pointing to the main interpreter, not the subinterpreter. This means that any GIL assumptions about the object are wrong! There isn’t any synchronization between the two GILs, so both the thread and the main thread could try to increment the object’s reference count at the same time, causing a data race.

An Interpreter Can Concurrently Deallocate

The other way of creating a native thread that can invoke Python, PyThreadState_New() and PyThreadState_Swap(), is a lot better for supporting subinterpreters (because PyThreadState_New() takes an explicit interpreter, rather than assuming that the main interpreter was requested), but is still limited by the current hanging problems in the C API. Manual creation of thread states (“manual” in contrast to the implicit creation of one in PyGILState_Ensure()) does not solve any of the aforementioned thread-safety issues with thread states.

In addition, subinterpreters typically have a much shorter lifetime than the main interpreter, so if there was no synchronization between the calling thread and the created thread, there’s a much higher chance that an interpreter-state passed to a thread will have already finished and have been deallocated, causing use-after-free crashes. As of writing, this is a relatively theoretical problem, but it’s likely this will become more of an issue in newer versions with the recent acceptance of PEP 734.

Rationale

So, how do we address all of this? The best way seems to be starting from scratch and “reimagining” how to create, acquire and attach thread states in the C API.

Preventing Interpreter Shutdown with Reference Counting

This PEP takes an approach where an interpreter is given a reference count that prevents it from shutting down. So, holding a “strong reference” to the interpreter will make it safe to call the C API without worrying about the thread being hung.

This means that interfacing Python (for example, in a C++ library) will need a reference to the interpreter in order to safely call the object, which is definitely more inconvenient than assuming the main interpreter is the right choice, but there’s not really another option. A future proposal could perhaps make this cleaner by adding a tracking mechanism for an object’s interpreter (such as a field on PyObject).

Generally speaking, a strong interpreter reference should be short-lived. An interpreter reference should act similar to a lock, or a “critical section”, where the interpreter must not hang the thread or deallocate. For example, when acquiring an IO lock, a strong interpreter reference should be acquired before locking, and then released once the lock is released.

Weak References

This proposal also comes with weak references to an interpreter that don’t prevent it from shutting down, but can be promoted to a strong reference when the user decides that they want to call the C API. If an interpreter is destroyed or past the point where it can create strong references, promotion of a weak reference will fail.

A weak reference will typically live much longer than a strong reference. This is useful for many of the asynchronous situations stated previously, where the thread itself shouldn’t prevent the desired interpreter from shutting down, but also allow the thread to execute Python when needed.

For example, a (non-reentrant) event handler may store a weak interpreter reference in its void *arg parameter, and then that weak reference will be promoted to a strong reference when it’s time to call Python code.

Removing the outdated GIL-state APIs

Due to the unfixable issues with PyGILState, this PEP intends to do away with them entirely. In today’s C API, all PyGILState functions are replaceable with PyThreadState counterparts that are compatibile with subinterpreters:

This PEP specifies a deprecation for these functions (while remaining in the stable ABI), because PyThreadState_Ensure() and PyThreadState_Release() will act as more-correct replacements for PyGILState_Ensure() and PyGILState_Release(), due to the requirement of a specific interpreter.

The exact details of this deprecation aren’t too clear. It’s likely that the usual five-year deprecation (as specificed by PEP 387) will be too short, so for now, these functions will have no specific removal date.

Specification

Interpreter References to Prevent Shutdown

An interpreter will keep a reference count that’s managed by users of the C API. When the interpreter starts finalizing, it will wait until its reference count reaches zero before proceeding to a point where threads will be hung and it may deallocate its state. The interpreter will wait on its reference count around the same time when threading.Thread objects are joined, but note that this is not the same as joining the thread; the interpreter will only wait until the reference count is zero, and then proceed. After the reference count has reached zero, threads can no longer prevent the interpreter from shutting down (thus PyInterpreterRef_Get() and PyInterpreterWeakRef_AsStrong() will fail).

A weak reference to an interpreter won’t prevent it from finalizing, and can be safely accessed after the interpreter no longer supports creating strong references, and even after the interpreter-state has been deleted. Deletion and duplication of the weak reference will always be allowed, but promotion (PyInterpreterWeakRef_AsStrong()) will always fail after the interpreter reaches a point where strong references have been waited on.

Strong Interpreter References

type PyInterpreterRef
An opaque, strong reference to an interpreter. The interpreter will wait until a strong reference has been released before shutting down.

This type is guaranteed to be pointer-sized.

int PyInterpreterRef_Get(PyInterpreterRef *ref)
Acquire a strong reference to the current interpreter.

On success, this function returns 0 and sets ref to a strong reference to the interpreter, and returns -1 with an exception set on failure.

Failure typically indicates that the interpreter has already finished waiting on strong references.

The caller must hold an attached thread state.

int PyInterpreterRef_Main(PyInterpreterRef *ref)
Acquire a strong reference to the main interpreter.

This function only exists for special cases where a specific interpreter can’t be saved. Prefer safely acquiring a reference through PyInterpreterRef_Get() whenever possible.

On success, this function will return 0 and set ref to a strong reference, and on failure, this function will return -1.

Failure typically indicates that the main interpreter has already finished waiting on its reference count.

The caller does not need to hold an attached thread state.

PyInterpreterState *PyInterpreterRef_AsInterpreter(PyInterpreterRef ref)
Return the interpreter denoted by ref.

This function cannot fail, and the caller doesn’t need to hold an attached thread state.

PyInterpreterRef PyInterpreterRef_Dup(PyInterpreterRef ref)
Duplicate a strong reference to an interpreter.

This function cannot fail, and the caller doesn’t need to hold an attached thread state.

void PyInterpreterRef_Close(PyInterpreterRef ref)
Release a strong reference to an interpreter, allowing it to shut down if there are no references left.

This function cannot fail, and the caller doesn’t need to hold an attached thread state.

Weak Interpreter References

type PyInterpreterWeakRef
An opaque, weak reference to an interpreter. The interpreter will not wait for the reference to be released before shutting down.

This type is guaranteed to be pointer-sized.

int PyInterpreterWeakRef_Get(PyInterpreterWeakRef *wref)
Acquire a weak reference to the current interpreter.

This function is generally meant to be used in tandem with PyInterpreterWeakRef_AsStrong().

On success, this function returns 0 and sets wref to a weak reference to the interpreter, and returns -1 with an exception set on failure.

The caller must hold an attached thread state.

PyInterpreterWeakRef PyInterpreterWeakRef_Dup(PyInterpreterWeakRef wref)
Duplicate a weak reference to an interpreter.

This function cannot fail, and the caller doesn’t need to hold an attached thread state.

int PyInterpreterWeakRef_AsStrong(PyInterpreterWeakRef wref, PyInterpreterRef *ref)
Acquire a strong reference to an interpreter through a weak reference.

On success, this function returns 0 and sets ref to a strong reference to the interpreter denoted by wref.

If the interpreter no longer exists or has already finished waiting for its reference count to reach zero, then this function returns -1 without an exception set.

This function is not safe to call in a re-entrant signal handler.

The caller does not need to hold an attached thread state.

void PyInterpreterWeakRef_Close(PyInterpreterWeakRef wref)
Release a weak reference to an interpreter.

This function cannot fail, and the caller doesn’t need to hold an attached thread state.

Ensuring and Releasing Thread States

This proposal includes two new high-level threading APIs that intend to replace PyGILState_Ensure() and PyGILState_Release().

int PyThreadState_Ensure(PyInterpreterRef ref)
Ensure that the thread has an attached thread state for the interpreter denoted by ref, and thus can safely invoke that interpreter. It is OK to call this function if the thread already has an attached thread state, as long as there is a subsequent call to PyThreadState_Release() that matches this one.

Nested calls to this function will only sometimes create a new thread state. If there is no attached thread state, then this function will check for the most recent attached thread state used by this thread. If none exists or it doesn’t match ref, a new thread state is created. If it does match ref, it is reattached. If there is an attached thread state, then a similar check occurs; if the interpreter matches ref, it is attached, and otherwise a new thread state is created.

Return 0 on success, and -1 on failure.

void PyThreadState_Release()
Release a PyThreadState_Ensure() call.

The attached thread state prior to the corresponding PyThreadState_Ensure() call is guaranteed to be restored upon returning. The cached thread state as used by PyThreadState_Ensure() and PyGILState_Ensure() will also be restored.

This function cannot fail.

Deprecation of GIL-state APIs

This PEP deprecates all of the existing PyGILState APIs in favor of the existing and new PyThreadState APIs. Namely:

All of the PyGILState APIs are to be removed from the non-limited C API in a future Python version. They will remain available in the stable ABI for compatibility.

It’s worth noting that PyThreadState_Get() and PyThreadState_GetUnchecked() aren’t perfect replacements for PyGILState_GetThisThreadState(), because PyGILState_GetThisThreadState() is able to return a thread state even when it is detached. This PEP intentionally doesn’t leave a perfect replacement for this, because the GIL-state pointer (which holds the last used thread state by the thread) is only useful for those implementing PyThreadState_Ensure() or similar. It’s not a common API to want as a user.

Backwards Compatibility

This PEP specifies a breaking change with the removal of all the PyGILState APIs from the public headers of the non-limited C API in a future version.

Security Implications

This PEP has no known security implications.

How to Teach This

As with all C API functions, all the new APIs in this PEP will be documented in the C API documentation, ideally under the Non-Python created threads section. The existing PyGILState documentation should be updated accordingly to point to the new APIs.

Examples

These examples are here to help understand the APIs described in this PEP. Ideally, they could be reused in the documentation.

Example: A Library Interface

Imagine that you’re developing a C library for logging. You might want to provide an API that allows users to log to a Python file object.

With this PEP, you’d implement it like this:

int
LogToPyFile(PyInterpreterWeakRef wref,
            PyObject *file,
            const char *text)
{
    PyInterpreterRef ref;
    if (PyInterpreterWeakRef_AsStrong(wref, &ref) < 0) {
        /* Python interpreter has shut down */
        return -1;
    }

    if (PyThreadState_Ensure(ref) < 0) {
        PyInterpreterRef_Close(ref);
        puts("Out of memory.\n", stderr);
        return -1;
    }

    char *to_write = do_some_text_mutation(text);
    int res = PyFile_WriteString(to_write, file);
    free(to_write);
    PyErr_Print();

    PyThreadState_Release();
    PyInterpreterRef_Close(ref);
    return res < 0;
}

If you were to use PyGILState_Ensure() for this case, then your thread would hang if the interpreter were to be finalizing at that time!

Additionally, the API supports subinterpreters. If you were to assume that the main interpreter created the file object (via PyGILState_Ensure()), then using file objects owned by a subinterpreter could possibly crash.

Example: A Single-threaded Ensure

This example shows acquiring a lock in a Python method.

If this were to be called from a daemon thread, then the interpreter could hang the thread while reattaching the thread state, leaving us with the lock held. Any future finalizer that wanted to acquire the lock would be deadlocked!

static PyObject *
my_critical_operation(PyObject *self, PyObject *unused)
{
    assert(PyThreadState_GetUnchecked() != NULL);
    PyInterpreterRef ref;
    if (PyInterpreterRef_Get(&ref) < 0) {
        /* Python interpreter has shut down */
        return NULL;
    }

    Py_BEGIN_ALLOW_THREADS;
    acquire_some_lock();

    /* Do something while holding the lock.
       The interpreter won't finalize during this period. */
    // ...

    release_some_lock();
    Py_END_ALLOW_THREADS;
    PyInterpreterRef_Close(ref);
    Py_RETURN_NONE;
}

Example: Transitioning From the Legacy Functions

The following code uses the PyGILState APIs:

static int
thread_func(void *arg)
{
    PyGILState_STATE gstate = PyGILState_Ensure();
    /* It's not an issue in this example, but we just attached
       a thread state for the main interpreter. If my_method() was
       originally called in a subinterpreter, then we would be unable
       to safely interact with any objects from it. */
    if (PyRun_SimpleString("print(42)") < 0) {
        PyErr_Print();
    }
    PyGILState_Release(gstate);
    return 0;
}

static PyObject *
my_method(PyObject *self, PyObject *unused)
{
    PyThread_handle_t handle;
    PyThead_indent_t indent;

    if (PyThread_start_joinable_thread(thread_func, NULL, &ident, &handle) < 0) {
        return NULL;
    }
    Py_BEGIN_ALLOW_THREADS;
    PyThread_join_thread(handle);
    Py_END_ALLOW_THREADS;
    Py_RETURN_NONE;
}

This is the same code, rewritten to use the new functions:

static int
thread_func(void *arg)
{
    PyInterpreterRef interp = (PyInterpreterRef)arg;
    if (PyThreadState_Ensure(interp) < 0) {
        PyInterpreterRef_Close(interp);
        return -1;
    }
    if (PyRun_SimpleString("print(42)") < 0) {
        PyErr_Print();
    }
    PyThreadState_Release();
    PyInterpreterRef_Close(interp);
    return 0;
}

static PyObject *
my_method(PyObject *self, PyObject *unused)
{
    PyThread_handle_t handle;
    PyThead_indent_t indent;

    PyInterpreterRef ref;
    if (PyInterpreterRef_Get(&ref) < 0) {
        return NULL;
    }

    if (PyThread_start_joinable_thread(thread_func, (void *)ref, &ident, &handle) < 0) {
        PyInterpreterRef_Close(ref);
        return NULL;
    }
    Py_BEGIN_ALLOW_THREADS
    PyThread_join_thread(handle);
    Py_END_ALLOW_THREADS
    Py_RETURN_NONE;
}

Example: A Daemon Thread

With this PEP, daemon threads are very similar to how native threads are used in the C API today. After calling PyThreadState_Ensure(), simply release the interpreter reference, allowing the interpreter to shut down.

static int
thread_func(void *arg)
{
    PyInterpreterRef ref = (PyInterpreterRef)arg;
    if (PyThreadState_Ensure(ref) < 0) {
        PyInterpreterRef_Close(ref);
        return -1;
    }
    /* Release the interpreter reference, allowing it to
       finalize. This means that print(42) can hang this thread. */
    PyInterpreterRef_Close(ref);
    if (PyRun_SimpleString("print(42)") < 0) {
        PyErr_Print();
    }
    PyThreadState_Release();
    return 0;
}

static PyObject *
my_method(PyObject *self, PyObject *unused)
{
    PyThread_handle_t handle;
    PyThead_indent_t indent;

    PyInterpreterRef ref;
    if (PyInterpreterRef_Get(&ref) < 0) {
        return NULL;
    }

    if (PyThread_start_joinable_thread(thread_func, (void *)ref, &ident, &handle) < 0) {
        PyInterpreterRef_Close(ref);
        return NULL;
    }
    Py_RETURN_NONE;
}

Example: An Asynchronous Callback

In some cases, the thread might not ever start, such as in a callback. We can’t use a strong reference here, because a strong reference would deadlock the interpreter if it’s not released.

typedef struct {
    PyInterpreterWeakRef wref;
} ThreadData;

static int
async_callback(void *arg)
{
    ThreadData *data = (ThreadData *)arg;
    PyInterpreterWeakRef wref = data->wref;
    PyInterpreterRef ref;
    if (PyInterpreterWeakRef_AsStrong(wref, &ref) < 0) {
        fputs("Python has shut down!\n", stderr);
        return -1;
    }

    if (PyThreadState_Ensure(ref) < 0) {
        PyInterpreterRef_Close(ref);
        return -1;
    }
    if (PyRun_SimpleString("print(42)") < 0) {
        PyErr_Print();
    }
    PyThreadState_Release();
    PyInterpreterRef_Close(ref);
    return 0;
}

static PyObject *
setup_callback(PyObject *self, PyObject *unused)
{
    // Weak reference to the interpreter. It won't wait on the callback
    // to finalize.
    ThreadData *tdata = PyMem_RawMalloc(sizeof(ThreadData));
    if (tdata == NULL) {
        PyErr_NoMemory();
        return NULL;
    }
    PyInterpreterWeakRef wref;
    if (PyInterpreterWeakRef_Get(&wref) < 0) {
        PyMem_RawFree(tdata);
        return NULL;
    }
    tdata->wref = wref;
    register_callback(async_callback, tdata);

    Py_RETURN_NONE;
}

Example: Calling Python Without a Callback Parameter

There are a few cases where callback functions don’t take a callback parameter (void *arg), so it’s impossible to acquire a reference to any specific interpreter. The solution to this problem is to acquire a reference to the main interpreter through PyInterpreterRef_Main().

But wait, won’t that break with subinterpreters, per PyGILState_Ensure Doesn’t Guess the Correct Interpreter? Fortunately, since the callback has no callback parameter, it’s not possible for the caller to pass any objects or interpreter-specific data, so it’s completely safe to choose the main interpreter here.

static void
call_python(void)
{
    PyInterpreterRef ref;
    if (PyInterpreterRef_Main(&ref) < 0) {
        fputs("Python has shut down!", stderr);
        return;
    }

    if (PyThreadState_Ensure(ref) < 0) {
        PyInterpreterRef_Close(ref);
        return -1;
    }
    if (PyRun_SimpleString("print(42)") < 0) {
        PyErr_Print();
    }
    PyThreadState_Release();
    PyInterpreterRef_Close(ref);
    return 0;
}

Reference Implementation

A reference implementation of this PEP can be found at python/cpython#133110.

Rejected Ideas

Non-daemon Thread States

In prior iterations of this PEP, interpreter references were a property of a thread state rather than a property of an interpreter. This meant that PyThreadState_Ensure() stole a strong interpreter reference, and it was released upon calling PyThreadState_Release(). A thread state that held a reference to an interpreter was known as a “non-daemon thread state.” At first, this seemed like an improvement, because it shifted management of a reference’s lifetime to the thread instead of the user, which eliminated some boilerplate.

However, this ended up making the proposal significantly more complex and hurt the proposal’s goals:

  • Most importantly, non-daemon thread states put too much emphasis on daemon threads as the problem, which hurt the clarity of the PEP. Additionally, the phrase “non-daemon” added extra confusion, because non-daemon Python threads are explicitly joined, whereas a non-daemon C thread is only waited on until it releases its reference.
  • In many cases, an interpreter reference should outlive a singular thread state. Stealing the interpreter reference in PyThreadState_Ensure() was particularly troublesome for these cases. If PyThreadState_Ensure() didn’t steal a reference with non-daemon thread states, it would muddy the ownership story of the interpreter reference, leading to a more confusing API.

Retrofiting the Existing Structures with Reference Counts

Interpreter-State Pointers for Reference Counting

Originally, this PEP specified PyInterpreterState_Hold() and PyInterpreterState_Release() for managing strong references to an interpreter, alongside PyInterpreterState_Lookup() which converted interpreter IDs (weak references) to strong references.

In the end, this was rejected, primarily because it was needlessly confusing. Interpreter states hadn’t ever had a reference count prior, so there was a lack of intuition about when and where something was a strong reference. The PyInterpreterRef and PyInterpreterWeakRef types seem a lot clearer.

Interpreter IDs for Reference Counting

Some iterations of this API took an int64_t interp_id parameter instead of PyInterpreterState *interp, because interpreter IDs cannot be concurrently deleted and cause use-after-free violations. The reference counting APIs in this PEP sidestep this issue anyway, but an interpreter ID have the advantage of requiring less magic:

  • Nearly all existing interpreter APIs already return a PyInterpreterState pointer, not an interpreter ID. Functions like PyThreadState_GetInterpreter() would have to be accompanied by frustrating calls to PyInterpreterState_GetID().
  • Threads typically take a void *arg parameter, not an int64_t arg. As such, passing a reference requires much less boilerplate for the user, because an additional structure definition or heap allocation would be needed to store the interpreter ID. This is especially an issue on 32-bit systems, where void * is too small for an int64_t.
  • To retain usability, interpreter ID APIs would still need to keep a reference count, otherwise the interpreter could be finalizing before the native thread gets a chance to attach. The problem with using an interpreter ID is that the reference count has to be “invisible”; it must be tracked elsewhere in the interpreter, likely being more complex than PyInterpreterRef_Get(). There’s also a lack of intuition that a standalone integer could have such a thing as a reference count.

Exposing an Activate/Deactivate API instead of Ensure/Clear

In prior discussions of this API, it was suggested to provide actual PyThreadState pointers in the API in an attempt to make the ownership and lifetime of the thread state clearer:

More importantly though, I think this makes it clearer who owns the thread state - a manually created one is controlled by the code that created it, and once it’s deleted it can’t be activated again.

This was ultimately rejected for two reasons:

Using PyStatus for the Return Value of PyThreadState_Ensure

In prior iterations of this API, PyThreadState_Ensure() returned a PyStatus instead of an integer to denote failures, which had the benefit of providing an error message.

This was rejected because it’s not clear that an error message would be all that useful; all the conceived use-cases for this API wouldn’t really care about a message indicating why Python can’t be invoked. As such, the API would only be needlessly harder to use, which in turn would hurt the transition from PyGILState_Ensure().

In addition, PyStatus isn’t commonly used in the C API. A few functions related to interpreter initialization use it (simply because they can’t raise exceptions), and PyThreadState_Ensure() does not fall under that category.

Acknowledgements

This PEP is based on prior work, feedback, and discussions from many people, including Victor Stinner, Antoine Pitrou, Da Woods, Sam Gross, Matt Page, Ronald Oussoren, Matt Wozniski, Eric Snow, Steve Dower, Petr Viktorin, and Gregory P. Smith.


Source: https://github.com/python/peps/blob/main/peps/pep-0788.rst

Last modified: 2025-07-02 07:29:08 GMT