PEP: 734 Title: Multiple Interpreters in the Stdlib Author: Eric Snow
<ericsnowcurrently@gmail.com> Discussions-To:
https://discuss.python.org/t/pep-734-multiple-interpreters-in-the-stdlib/41147
Status: Deferred Type: Standards Track Created: 06-Nov-2023
Python-Version: 3.13 Post-History: 14-Dec-2023, Replaces: 554
Resolution:
https://discuss.python.org/t/pep-734-multiple-interpreters-in-the-stdlib/41147/24

Note

This PEP is essentially a continuation of PEP 554. That document had
grown a lot of ancillary information across 7 years of discussion. This
PEP is a reduction back to the essential information. Much of that extra
information is still valid and useful, just not in the immediate context
of the specific proposal here.

Abstract

This PEP proposes to add a new module, interpreters, to support
inspecting, creating, and running code in multiple interpreters in the
current process. This includes Interpreter objects that represent the
underlying interpreters. The module will also provide a basic Queue
class for communication between interpreters. Finally, we will add a new
concurrent.futures.InterpreterPoolExecutor based on the interpreters
module.

Introduction

Fundamentally, an "interpreter" is the collection of (essentially) all
runtime state which Python threads must share. So, let's first look at
threads. Then we'll circle back to interpreters.

Threads and Thread States

A Python process will have one or more OS threads running Python code
(or otherwise interacting with the C API). Each of these threads
interacts with the CPython runtime using its own thread state
(PyThreadState), which holds all the runtime state unique to that
thread. There is also some runtime state that is shared between multiple
OS threads.

Any OS thread may switch which thread state it is currently using, as
long as it isn't one that another OS thread is already using (or has
been using). This "current" thread state is stored by the runtime in a
thread-local variable, and may be looked up explicitly with
PyThreadState_Get(). It gets set automatically for the initial ("main")
OS thread and for threading.Thread objects. From the C API it is set
(and cleared) by PyThreadState_Swap() and may be set by
PyGILState_Ensure(). Most of the C API requires that there be a current
thread state, either looked up implicitly or passed in as an argument.

The relationship between OS threads and thread states is one-to-many.
Each thread state is associated with at most a single OS thread and
records its thread ID. A thread state is never used for more than one OS
thread. In the other direction, however, an OS thread may have more than
one thread state associated with it, though, again, only one may be
current.

When there's more than one thread state for an OS thread,
PyThreadState_Swap() is used in that OS thread to switch between them,
with the requested thread state becoming the current one. Whatever was
running in the thread using the old thread state is effectively paused
until that thread state is swapped back in.

Interpreter States

As noted earlier, there is some runtime state that multiple OS threads
share. Some of it is exposed by the sys module, though much is used
internally and not exposed explicitly or only through the C API.

This shared state is called the interpreter state (PyInterpreterState).
We'll sometimes refer to it here as just "interpreter", though that is
also sometimes used to refer to the python executable, to the Python
implementation, and to the bytecode interpreter (i.e. exec()/eval()).

CPython has supported multiple interpreters in the same process (AKA
"subinterpreters") since version 1.5 (1997). The feature has been
available via the C API <python:sub-interpreter-support>.

Interpreters and Threads

Thread states are related to interpreter states in much the same way
that OS threads and processes are related (at a high level). To begin
with, the relationship is one-to-many. A thread state belongs to a
single interpreter (and stores a pointer to it). That thread state is
never used for a different interpreter. In the other direction, however,
an interpreter may have zero or more thread states associated with it.
The interpreter is only considered active in OS threads where one of its
thread states is current.

Interpreters are created via the C API using
Py_NewInterpreterFromConfig() (or Py_NewInterpreter(), which is a light
wrapper around Py_NewInterpreterFromConfig()). That function does the
following:

1.  create a new interpreter state
2.  create a new thread state
3.  set the thread state as current (a current tstate is needed for
    interpreter init)
4.  initialize the interpreter state using that thread state
5.  return the thread state (still current)

Note that the returned thread state may be immediately discarded. There
is no requirement that an interpreter have any thread states, except as
soon as the interpreter is meant to actually be used. At that point it
must be made active in the current OS thread.

To make an existing interpreter active in the current OS thread, the C
API user first makes sure that interpreter has a corresponding thread
state. Then PyThreadState_Swap() is called like normal using that thread
state. If the thread state for another interpreter was already current
then it gets swapped out like normal and execution of that interpreter
in the OS thread is thus effectively paused until it is swapped back in.

Once an interpreter is active in the current OS thread like that, the
thread can call any of the C API, such as PyEval_EvalCode() (i.e.
exec()). This works by using the current thread state as the runtime
context.

The "Main" Interpreter

When a Python process starts, it creates a single interpreter state (the
"main" interpreter) with a single thread state for the current OS
thread. The Python runtime is then initialized using them.

After initialization, the script or module or REPL is executed using
them. That execution happens in the interpreter's __main__ module.

When the process finishes running the requested Python code or REPL, in
the main OS thread, the Python runtime is finalized in that thread using
the main interpreter.

Runtime finalization has only a slight, indirect effect on still-running
Python threads, whether in the main interpreter or in subinterpreters.
That's because right away it waits indefinitely for all non-daemon
Python threads to finish.

While the C API may be queried, there is no mechanism by which any
Python thread is directly alerted that finalization has begun, other
than perhaps with "atexit" functions that may be been registered using
threading._register_atexit().

Any remaining subinterpreters are themselves finalized later, but at
that point they aren't current in any OS threads.

Interpreter Isolation

CPython's interpreters are intended to be strictly isolated from each
other. That means interpreters never share objects (except in very
specific cases with immortal, immutable builtin objects). Each
interpreter has its own modules (sys.modules), classes, functions, and
variables. Even where two interpreters define the same class, each will
have its own copy. The same applies to state in C, including in
extension modules. The CPython C API docs explain more.

Notably, there is some process-global state that interpreters will
always share, some mutable and some immutable. Sharing immutable state
presents few problems, while providing some benefits (mainly
performance). However, all shared mutable state requires special
management, particularly for thread-safety, some of which the OS takes
care of for us.

Mutable:

-   file descriptors
-   low-level env vars
-   process memory (though allocators are isolated)
-   the list of interpreters

Immutable:

-   builtin types (e.g. dict, bytes)
-   singletons (e.g. None)
-   underlying static module data (e.g. functions) for
    builtin/extension/frozen modules

Existing Execution Components

There are a number of existing parts of Python that may help with
understanding how code may be run in a subinterpreter.

In CPython, each component is built around one of the following C API
functions (or variants):

-   PyEval_EvalCode(): run the bytecode interpreter with the given code
    object
-   PyRun_String(): compile + PyEval_EvalCode()
-   PyRun_File(): read + compile + PyEval_EvalCode()
-   PyRun_InteractiveOneObject(): compile + PyEval_EvalCode()
-   PyObject_Call(): calls PyEval_EvalCode()

builtins.exec()

The builtin exec() may be used to execute Python code. It is essentially
a wrapper around the C API functions PyRun_String() and
PyEval_EvalCode().

Here are some relevant characteristics of the builtin exec():

-   It runs in the current OS thread and pauses whatever was running
    there, which resumes when exec() finishes. No other OS threads are
    affected. (To avoid pausing the current Python thread, run exec() in
    a threading.Thread.)
-   It may start additional threads, which don't interrupt it.
-   It executes against a "globals" namespace (and a "locals"
    namespace). At module-level, exec() defaults to using __dict__ of
    the current module (i.e. globals()). exec() uses that namespace
    as-is and does not clear it before or after.
-   It propagates any uncaught exception from the code it ran. The
    exception is raised from the exec() call in the Python thread that
    originally called exec().

Command-line

The python CLI provides several ways to run Python code. In each case it
maps to a corresponding C API call:

-   <no args>, -i - run the REPL (PyRun_InteractiveOneObject())
-   <filename> - run a script (PyRun_File())
-   -c <code> - run the given Python code (PyRun_String())
-   -m module - run the module as a script (PyEval_EvalCode() via
    runpy._run_module_as_main())

In each case it is essentially a variant of running exec() at the
top-level of the __main__ module of the main interpreter.

threading.Thread

When a Python thread is started, it runs the "target" function with
PyObject_Call() using a new thread state. The globals namespace come
from func.__globals__ and any uncaught exception is discarded.

Motivation

The interpreters module will provide a high-level interface to the
multiple interpreter functionality. The goal is to make the existing
multiple-interpreters feature of CPython more easily accessible to
Python code. This is particularly relevant now that CPython has a
per-interpreter GIL (PEP 684) and people are more interested in using
multiple interpreters.

Without a stdlib module, users are limited to the
C API <python:sub-interpreter-support>, which restricts how much they
can try out and take advantage of multiple interpreters.

The module will include a basic mechanism for communicating between
interpreters. Without one, multiple interpreters are a much less useful
feature.

Specification

The module will:

-   expose the existing multiple interpreter support
-   introduce a basic mechanism for communicating between interpreters

The module will wrap a new low-level _interpreters module (in the same
way as the threading module). However, that low-level API is not
intended for public use and thus not part of this proposal.

Using Interpreters

The module defines the following functions:

-   

    get_current() -> Interpreter

        Returns the Interpreter object for the currently executing
        interpreter.

-   

    list_all() -> list[Interpreter]

        Returns the Interpreter object for each existing interpreter,
        whether it is currently running in any OS threads or not.

-   

    create() -> Interpreter

        Create a new interpreter and return the Interpreter object for
        it. The interpreter doesn't do anything on its own and is not
        inherently tied to any OS thread. That only happens when
        something is actually run in the interpreter (e.g.
        Interpreter.exec()), and only while running. The interpreter may
        or may not have thread states ready to use, but that is strictly
        an internal implementation detail.

Interpreter Objects

An interpreters.Interpreter object that represents the interpreter
(PyInterpreterState) with the corresponding unique ID. There will only
be one object for any given interpreter.

If the interpreter was created with interpreters.create() then it will
be destroyed as soon as all Interpreter objects with its ID (across all
interpreters) have been deleted.

Interpreter objects may represent other interpreters than those created
by interpreters.create(). Examples include the main interpreter (created
by Python's runtime initialization) and those created via the C-API,
using Py_NewInterpreter(). Such Interpreter objects will not be able to
interact with their corresponding interpreters, e.g. via
Interpreter.exec() (though we may relax this in the future).

Attributes and methods:

-   

    id

        (read-only) A non-negative int that identifies the interpreter
        that this Interpreter instance represents. Conceptually, this is
        similar to a process ID.

-   

    __hash__()

        Returns the hash of the interpreter's id. This is the same as
        the hash of the ID's integer value.

-   

    is_running() -> bool

        Returns True if the interpreter is currently executing code in
        its __main__ module. This excludes sub-threads.

        It refers only to if there is an OS thread running a script
        (code) in the interpreter's __main__ module. That basically
        means whether or not Interpreter.exec() is running in some OS
        thread. Code running in sub-threads is ignored.

-   

    prepare_main(**kwargs)

        Bind one or more objects in the interpreter's __main__ module.

        The keyword argument names will be used as the attribute names.
        The values will be bound as new objects, though exactly
        equivalent to the original. Only objects specifically supported
        for passing between interpreters are allowed. See Shareable
        Objects.

        prepare_main() is helpful for initializing the globals for an
        interpreter before running code in it.

-   

    exec(code, /)

        Execute the given source code in the interpreter (in the current
        OS thread), using its __main__ module. It doesn't return
        anything.

        This is essentially equivalent to switching to this interpreter
        in the current OS thread and then calling the builtin exec()
        using this interpreter's __main__ module's __dict__ as the
        globals and locals.

        The code running in the current OS thread (a different
        interpreter) is effectively paused until Interpreter.exec()
        finishes. To avoid pausing it, create a new threading.Thread and
        call Interpreter.exec() in it (like Interpreter.call_in_thread()
        does).

        Interpreter.exec() does not reset the interpreter's state nor
        the __main__ module, neither before nor after, so each
        successive call picks up where the last one left off. This can
        be useful for running some code to initialize an interpreter
        (e.g. with imports) before later performing some repeated task.

        If there is an uncaught exception, it will be propagated into
        the calling interpreter as an ExecutionFailed. The full error
        display of the original exception, generated relative to the
        called interpreter, is preserved on the propagated
        ExecutionFailed. That includes the full traceback, with all the
        extra info like syntax error details and chained exceptions. If
        the ExecutionFailed is not caught then that full error display
        will be shown, much like it would be if the propagated exception
        had been raised in the main interpreter and uncaught. Having the
        full traceback is particularly useful when debugging.

        If exception propagation is not desired then an explicit
        try-except should be used around the code passed to
        Interpreter.exec(). Likewise any error handling that depends on
        specific information from the exception must use an explicit
        try-except around the given code, since ExecutionFailed will not
        preserve that information.

-   

    call(callable, /)

        Call the callable object in the interpreter. The return value is
        discarded. If the callable raises an exception then it gets
        propagated as an ExecutionFailed exception, in the same way as
        Interpreter.exec().

        For now only plain functions are supported and only ones that
        take no arguments and have no cell vars. Free globals are
        resolved against the target interpreter's __main__ module.

        In the future, we can add support for arguments, closures, and a
        broader variety of callables, at least partly via pickle. We can
        also consider not discarding the return value. The initial
        restrictions are in place to allow us to get the basic
        functionality of the module out to users sooner.

-   

    call_in_thread(callable, /) -> threading.Thread

        Essentially, apply Interpreter.call() in a new thread. Return
        values are discarded and exceptions are not propagated.

        call_in_thread() is roughly equivalent to:

            def task():
                interp.call(func)
            t = threading.Thread(target=task)
            t.start()

-   

    close()

        Destroy the underlying interpreter.

Communicating Between Interpreters

The module introduces a basic communication mechanism through special
queues.

There are interpreters.Queue objects, but they only proxy the actual
data structure: an unbounded FIFO queue that exists outside any one
interpreter. These queues have special accommodations for safely passing
object data between interpreters, without violating interpreter
isolation. This includes thread-safety.

As with other queues in Python, for each "put" the object is added to
the back and each "get" pops the next one off the front. Every added
object will be popped off in the order it was pushed on.

Only objects that are specifically supported for passing between
interpreters may be sent through an interpreters.Queue. Note that the
actual objects aren't sent, but rather their underlying data. However,
the popped object will still be strictly equivalent to the original. See
Shareable Objects.

The module defines the following functions:

-   

    create_queue(maxsize=0, *, syncobj=False) -> Queue

        Create a new queue. If the maxsize is zero or negative then the
        queue is unbounded.

        "syncobj" is used as the default for put() and put_nowait().

Queue Objects

interpreters.Queue objects act as proxies for the underlying
cross-interpreter-safe queues exposed by the interpreters module. Each
Queue object represents the queue with the corresponding unique ID.
There will only be one object for any given queue.

Queue implements all the methods of queue.Queue except for task_done()
and join(), hence it is similar to asyncio.Queue and
multiprocessing.Queue.

Attributes and methods:

-   

    id

        (read-only) A non-negative int that identifies the corresponding
        cross-interpreter queue. Conceptually, this is similar to the
        file descriptor used for a pipe.

-   

    maxsize

        (read-only) Number of items allowed in the queue. Zero means
        "unbounded".

-   

    __hash__()

        Return the hash of the queue's id. This is the same as the hash
        of the ID's integer value.

-   

    empty()

        Return True if the queue is empty, False otherwise.

        This is only a snapshot of the state at the time of the call.
        Other threads or interpreters may cause this to change.

-   

    full()

        Return True if there are maxsize items in the queue.

        If the queue was initialized with maxsize=0 (the default), then
        full() never returns True.

        This is only a snapshot of the state at the time of the call.
        Other threads or interpreters may cause this to change.

-   

    qsize()

        Return the number of items in the queue.

        This is only a snapshot of the state at the time of the call.
        Other threads or interpreters may cause this to change.

-   

    put(obj, timeout=None, *, syncobj=None)

        Add the object to the queue.

        If maxsize > 0 and the queue is full then this blocks until a
        free slot is available. If timeout is a positive number then it
        only blocks at least that many seconds and then raises
        interpreters.QueueFull. Otherwise is blocks forever.

        If "syncobj" is true then the object must be shareable, which
        means the object's data is passed through rather than the object
        itself. If "syncobj" is false then all objects are supported.
        However, there are some performance penalties and all objects
        are copies (e.g. via pickle). Thus mutable objects will never be
        automatically synchronized between interpreters. If "syncobj" is
        None (the default) then the queue's default value is used.

        If an object is still in the queue, and the interpreter which
        put it in the queue (i.e. to which it belongs) is destroyed,
        then the object is immediately removed from the queue. (We may
        later add an option to replace the removed object in the queue
        with a sentinel or to raise an exception for the corresponding
        get() call.)

-   

    put_nowait(obj, *, syncobj=None)

        Like put() but effectively with an immediate timeout. Thus if
        the queue is full, it immediately raises interpreters.QueueFull.

-   

    get(timeout=None) -> object

        Pop the next object from the queue and return it. Block while
        the queue is empty. If a positive timeout is provided and an
        object hasn't been added to the queue in that many seconds then
        raise interpreters.QueueEmpty.

-   

    get_nowait() -> object

        Like get(), but do not block. If the queue is not empty then
        return the next item. Otherwise, raise interpreters.QueueEmpty.

Shareable Objects

Interpreter.prepare_main() only works with "shareable" objects. The same
goes for interpreters.Queue (optionally).

A "shareable" object is one which may be passed from one interpreter to
another. The object is not necessarily actually directly shared by the
interpreters. However, even if it isn't, the shared object should be
treated as though it were shared directly. That's a strong equivalence
guarantee for all shareable objects. (See below.)

For some types (builtin singletons), the actual object is shared. For
some, the object's underlying data is actually shared but each
interpreter has a distinct object wrapping that data. For all other
shareable types, a strict copy or proxy is made such that the
corresponding objects continue to match exactly. In cases where the
underlying data is complex but must be copied (e.g. tuple), the data is
serialized as efficiently as possible.

Shareable objects must be specifically supported internally by the
Python runtime. However, there is no restriction against adding support
for more types later.

Here's the initial list of supported objects:

-   str
-   bytes
-   int
-   float
-   bool (True/False)
-   None
-   tuple (only with shareable items)
-   interpreters.Queue
-   memoryview (underlying buffer actually shared)

Note that the last two on the list, queues and memoryview, are
technically mutable data types, whereas the rest are not. When any
interpreters share mutable data there is always a risk of data races.
Cross-interpreter safety, including thread-safety, is a fundamental
feature of queues.

However, memoryview does not have any native accommodations. The user is
responsible for managing thread-safety, whether passing a token back and
forth through a queue to indicate safety (see Synchronization), or by
assigning sub-range exclusivity to individual interpreters.

Most objects will be shared through queues (interpreters.Queue), as
interpreters communicate information between each other. Less
frequently, objects will be shared through prepare_main() to set up an
interpreter prior to running code in it. However, prepare_main() is the
primary way that queues are shared, to provide another interpreter with
a means of further communication.

Finally, a reminder: for a few types the actual object is shared,
whereas for the rest only the underlying data is shared, whether as a
copy or through a proxy. Regardless, it always preserves the strong
equivalence guarantee of "shareable" objects.

The guarantee is that a shared object in one interpreter is strictly
equivalent to the corresponding object in the other interpreter. In
other words, the two objects will be indistinguishable from each other.
The shared object should be treated as though the original had been
shared directly, whether or not it actually was. That's a slightly
different and stronger promise than just equality.

The guarantee is especially important for mutable objects, like
Interpreters.Queue and memoryview. Mutating the object in one
interpreter will always be reflected immediately in every other
interpreter sharing the object.

Synchronization

There are situations where two interpreters should be synchronized. That
may involve sharing a resource, worker management, or preserving
sequential consistency.

In threaded programming the typical synchronization primitives are types
like mutexes. The threading module exposes several. However,
interpreters cannot share objects which means they cannot share
threading.Lock objects.

The interpreters module does not provide any such dedicated
synchronization primitives. Instead, interpreters.Queue objects provide
everything one might need.

For example, if there's a shared resource that needs managed access then
a queue may be used to manage it, where the interpreters pass an object
around to indicate who can use the resource:

    import interpreters
    from mymodule import load_big_data, check_data

    numworkers = 10
    control = interpreters.create_queue()
    data = memoryview(load_big_data())

    def worker():
        interp = interpreters.create()
        interp.prepare_main(control=control, data=data)
        interp.exec("""if True:
            from mymodule import edit_data
            while True:
                token = control.get()
                edit_data(data)
                control.put(token)
            """)
    threads = [threading.Thread(target=worker) for _ in range(numworkers)]
    for t in threads:
        t.start()

    token = 'football'
    control.put(token)
    while True:
        control.get()
        if not check_data(data):
            break
        control.put(token)

Exceptions

-   

    InterpreterError

        Indicates that some interpreter-related failure occurred.

        This exception is a subclass of Exception.

-   

    InterpreterNotFoundError

        Raised from Interpreter methods after the underlying interpreter
        has been destroyed, e.g. via the C-API.

        This exception is a subclass of InterpreterError.

-   

    ExecutionFailed

        Raised from Interpreter.exec() and Interpreter.call() when
        there's an uncaught exception. The error display for this
        exception includes the traceback of the uncaught exception,
        which gets shown after the normal error display, much like
        happens for ExceptionGroup.

        Attributes:

        -   type - a representation of the original exception's class,
            with __name__, __module__, and __qualname__ attrs.
        -   msg - str(exc) of the original exception
        -   snapshot - a traceback.TracebackException object for the
            original exception

        This exception is a subclass of InterpreterError.

-   

    QueueError

        Indicates that some queue-related failure occurred.

        This exception is a subclass of Exception.

-   

    QueueNotFoundError

        Raised from interpreters.Queue methods after the underlying
        queue has been destroyed.

        This exception is a subclass of QueueError.

-   

    QueueEmpty

        Raised from Queue.get() (or get_nowait() with no default) when
        the queue is empty.

        This exception is a subclass of both QueueError and the stdlib
        queue.Empty.

-   

    QueueFull

        Raised from Queue.put() (with a timeout) or put_nowait() when
        the queue is already at its max size.

        This exception is a subclass of both QueueError and the stdlib
        queue.Empty.

InterpreterPoolExecutor

Along with the new interpreters module, there will be a new
concurrent.futures.InterpreterPoolExecutor. It will be a derivative of
ThreadPoolExecutor, where each worker executes in its own thread, but
each with its own subinterpreter.

Like the other executors, InterpreterPoolExecutor will support callables
for tasks, and for the initializer. Also like the other executors, the
arguments in both cases will be mostly unrestricted. The callables and
arguments will typically be serialized when sent to a worker's
interpreter, e.g. with pickle, like how the ProcessPoolExecutor works.
This contrasts with Interpreter.call(), which will (at least initially)
be much more restricted.

Communication between workers, or between the executor (or generally its
interpreter) and the workers, may still be done through
interpreters.Queue objects, set with the initializer.

sys.implementation.supports_isolated_interpreters

Python implementations are not required to support subinterpreters,
though most major ones do. If an implementation does support them then
sys.implementation.supports_isolated_interpreters will be set to True.
Otherwise it will be False. If the feature is not supported then
importing the interpreters module will raise an ImportError.

Examples

The following examples demonstrate practical cases where multiple
interpreters may be useful.

Example 1:

There's a stream of requests coming in that will be handled via workers
in sub-threads.

-   each worker thread has its own interpreter
-   there's one queue to send tasks to workers and another queue to
    return results
-   the results are handled in a dedicated thread
-   each worker keeps going until it receives a "stop" sentinel (None)
-   the results handler keeps going until all workers have stopped

    import interpreters
    from mymodule import iter_requests, handle_result

    tasks = interpreters.create_queue()
    results = interpreters.create_queue()

    numworkers = 20
    threads = []

    def results_handler():
        running = numworkers
        while running:
            try:
                res = results.get(timeout=0.1)
            except interpreters.QueueEmpty:
                # No workers have finished a request since last time.
                pass
            else:
                if res is None:
                    # A worker has stopped.
                    running -= 1
                else:
                    handle_result(res)
        empty = object()
        assert results.get_nowait(empty) is empty
    threads.append(threading.Thread(target=results_handler))

    def worker():
        interp = interpreters.create()
        interp.prepare_main(tasks=tasks, results=results)
        interp.exec("""if True:
            from mymodule import handle_request, capture_exception

            while True:
                req = tasks.get()
                if req is None:
                    # Stop!
                    break
                try:
                    res = handle_request(req)
                except Exception as exc:
                    res = capture_exception(exc)
                results.put(res)
            # Notify the results handler.
            results.put(None)
            """)
    threads.extend(threading.Thread(target=worker) for _ in range(numworkers))

    for t in threads:
        t.start()

    for req in iter_requests():
        tasks.put(req)
    # Send the "stop" signal.
    for _ in range(numworkers):
        tasks.put(None)

    for t in threads:
        t.join()

Example 2:

This case is similar to the last as there are a bunch of workers in
sub-threads. However, this time the code is chunking up a big array of
data, where each worker processes one chunk at a time. Copying that data
to each interpreter would be exceptionally inefficient, so the code
takes advantage of directly sharing memoryview buffers.

-   all the interpreters share the buffer of the source array
-   each one writes its results to a second shared buffer
-   there's use a queue to send tasks to workers
-   only one worker will ever read any given index in the source array
-   only one worker will ever write to any given index in the results
    (this is how it ensures thread-safety)

    import interpreters
    import queue
    from mymodule import read_large_data_set, use_results

    numworkers = 3
    data, chunksize = read_large_data_set()
    buf = memoryview(data)
    numchunks = (len(buf) + 1) / chunksize
    results = memoryview(b'\0' * numchunks)

    tasks = interpreters.create_queue()

    def worker(id):
        interp = interpreters.create()
        interp.prepare_main(data=buf, results=results, tasks=tasks)
        interp.exec("""if True:
            from mymodule import reduce_chunk

            while True:
                req = tasks.get()
                if res is None:
                    # Stop!
                    break
                resindex, start, end = req
                chunk = data[start: end]
                res = reduce_chunk(chunk)
                results[resindex] = res
            """)
    threads = [threading.Thread(target=worker) for _ in range(numworkers)]
    for t in threads:
        t.start()

    for i in range(numchunks):
        # Assume there's at least one worker running still.
        start = i * chunksize
        end = start + chunksize
        if end > len(buf):
            end = len(buf)
        tasks.put((start, end, i))
    # Send the "stop" signal.
    for _ in range(numworkers):
        tasks.put(None)

    for t in threads:
        t.join()

    use_results(results)

Rationale

A Minimal API

Since the core dev team has no real experience with how users will make
use of multiple interpreters in Python code, this proposal purposefully
keeps the initial API as lean and minimal as possible. The objective is
to provide a well-considered foundation on which further (more advanced)
functionality may be added later, as appropriate.

That said, the proposed design incorporates lessons learned from
existing use of subinterpreters by the community, from existing stdlib
modules, and from other programming languages. It also factors in
experience from using subinterpreters in the CPython test suite and
using them in concurrency benchmarks.

create(), create_queue()

Typically, users call a type to create instances of the type, at which
point the object's resources get provisioned. The interpreters module
takes a different approach, where users must call create() to get a new
interpreter or create_queue() for a new queue. Calling
interpreters.Interpreter() directly only returns a wrapper around an
existing interpreters (likewise for interpreters.Queue()).

This is because interpreters (and queues) are special resources. They
exist globally in the process and are not managed/owned by the current
interpreter. Thus the interpreters module makes creating an interpreter
(or queue) a visibly distinct operation from creating an instance of
interpreters.Interpreter (or interpreters.Queue).

Interpreter.prepare_main() Sets Multiple Variables

prepare_main() may be seen as a setter function of sorts. It supports
setting multiple names at once, e.g.
interp.prepare_main(spam=1, eggs=2), whereas most setters set one item
at a time. The main reason is for efficiency.

To set a value in the interpreter's __main__.__dict__, the
implementation must first switch the OS thread to the identified
interpreter, which involves some non-negligible overhead. After setting
the value it must switch back. Furthermore, there is some additional
overhead to the mechanism by which it passes objects between
interpreters, which can be reduced in aggregate if multiple values are
set at once.

Therefore, prepare_main() supports setting multiple values at once.

Propagating Exceptions

An uncaught exception from a subinterpreter, via Interpreter.exec(),
could either be (effectively) ignored, like threading.Thread() does, or
propagated, like the builtin exec() does. Since Interpreter.exec() is a
synchronous operation, like the builtin exec(), uncaught exceptions are
propagated.

However, such exceptions are not raised directly. That's because
interpreters are isolated from each other and must not share objects,
including exceptions. That could be addressed by raising a surrogate of
the exception, whether a summary, a copy, or a proxy that wraps it. Any
of those could preserve the traceback, which is useful for debugging.
The ExecutionFailed that gets raised is such a surrogate.

There's another concern to consider. If a propagated exception isn't
immediately caught, it will bubble up through the call stack until
caught (or not). In the case that code somewhere else may catch it, it
is helpful to identify that the exception came from a subinterpreter
(i.e. a "remote" source), rather than from the current interpreter.
That's why Interpreter.exec() raises ExecutionFailed and why it is a
plain Exception, rather than a copy or proxy with a class that matches
the original exception. For example, an uncaught ValueError from a
subinterpreter would never get caught in a later
try: ... except ValueError: .... Instead, ExecutionFailed must be
handled directly.

In contrast, exceptions propagated from Interpreter.call() do not
involve ExecutionFailed but are raised directly, as though originating
in the calling interpreter. This is because Interpreter.call() is a
higher level method that uses pickle to support objects that can't
normally be passed between interpreters.

Limited Object Sharing

As noted in Interpreter Isolation, only a small number of builtin
objects may be truly shared between interpreters. In all other cases
objects can only be shared indirectly, through copies or proxies.

The set of objects that are shareable as copies through queues (and
Interpreter.prepare_main()) is limited for the sake of efficiency.

Supporting sharing of all objects is possible (via pickle) but not part
of this proposal. For one thing, it's helpful to know in those cases
that only an efficient implementation is being used. Furthermore, in
those cases supporting mutable objects via pickling would violate the
guarantee that "shared" objects be equivalent (and stay that way).

Objects vs. ID Proxies

For both interpreters and queues, the low-level module makes use of
proxy objects that expose the underlying state by their corresponding
process-global IDs. In both cases the state is likewise process-global
and will be used by multiple interpreters. Thus they aren't suitable to
be implemented as PyObject, which is only really an option for
interpreter-specific data. That's why the interpreters module instead
provides objects that are weakly associated through the ID.

Rejected Ideas

See PEP 554 <554#rejected-ideas>.

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.