PEP 734 – Multiple Interpreters in the Stdlib
- Author:
- Eric Snow <ericsnowcurrently at gmail.com>
- Discussions-To:
- Discourse thread
- Status:
- Deferred
- Type:
- Standards Track
- Created:
- 06-Nov-2023
- Python-Version:
- 3.13
- Post-History:
- 14-Dec-2023
- Replaces:
- 554
- Resolution:
- Discourse message
Table of Contents
Note
This PEP is essentially a continuation of PEP 554. That document had grown a lot of ancillary information across 7 years of discussion. This PEP is a reduction back to the essential information. Much of that extra information is still valid and useful, just not in the immediate context of the specific proposal here.
Abstract
This PEP proposes to add a new module, interpreters
, to support
inspecting, creating, and running code in multiple interpreters in the
current process. This includes Interpreter
objects that represent
the underlying interpreters. The module will also provide a basic
Queue
class for communication between interpreters.
Finally, we will add a new concurrent.futures.InterpreterPoolExecutor
based on the interpreters
module.
Introduction
Fundamentally, an “interpreter” is the collection of (essentially) all runtime state which Python threads must share. So, let’s first look at threads. Then we’ll circle back to interpreters.
Threads and Thread States
A Python process will have one or more OS threads running Python code
(or otherwise interacting with the C API). Each of these threads
interacts with the CPython runtime using its own thread state
(PyThreadState
), which holds all the runtime state unique to that
thread. There is also some runtime state that is shared between
multiple OS threads.
Any OS thread may switch which thread state it is currently using, as
long as it isn’t one that another OS thread is already using (or has
been using). This “current” thread state is stored by the runtime
in a thread-local variable, and may be looked up explicitly with
PyThreadState_Get()
. It gets set automatically for the initial
(“main”) OS thread and for threading.Thread
objects. From the
C API it is set (and cleared) by PyThreadState_Swap()
and may
be set by PyGILState_Ensure()
. Most of the C API requires that
there be a current thread state, either looked up implicitly
or passed in as an argument.
The relationship between OS threads and thread states is one-to-many. Each thread state is associated with at most a single OS thread and records its thread ID. A thread state is never used for more than one OS thread. In the other direction, however, an OS thread may have more than one thread state associated with it, though, again, only one may be current.
When there’s more than one thread state for an OS thread,
PyThreadState_Swap()
is used in that OS thread to switch
between them, with the requested thread state becoming the current one.
Whatever was running in the thread using the old thread state is
effectively paused until that thread state is swapped back in.
Interpreter States
As noted earlier, there is some runtime state that multiple OS threads
share. Some of it is exposed by the sys
module, though much is
used internally and not exposed explicitly or only through the C API.
This shared state is called the interpreter state
(PyInterpreterState
). We’ll sometimes refer to it here as just
“interpreter”, though that is also sometimes used to refer to the
python
executable, to the Python implementation, and to the
bytecode interpreter (i.e. exec()
/eval()
).
CPython has supported multiple interpreters in the same process (AKA “subinterpreters”) since version 1.5 (1997). The feature has been available via the C API.
Interpreters and Threads
Thread states are related to interpreter states in much the same way that OS threads and processes are related (at a high level). To begin with, the relationship is one-to-many. A thread state belongs to a single interpreter (and stores a pointer to it). That thread state is never used for a different interpreter. In the other direction, however, an interpreter may have zero or more thread states associated with it. The interpreter is only considered active in OS threads where one of its thread states is current.
Interpreters are created via the C API using
Py_NewInterpreterFromConfig()
(or Py_NewInterpreter()
, which
is a light wrapper around Py_NewInterpreterFromConfig()
).
That function does the following:
- create a new interpreter state
- create a new thread state
- set the thread state as current (a current tstate is needed for interpreter init)
- initialize the interpreter state using that thread state
- return the thread state (still current)
Note that the returned thread state may be immediately discarded. There is no requirement that an interpreter have any thread states, except as soon as the interpreter is meant to actually be used. At that point it must be made active in the current OS thread.
To make an existing interpreter active in the current OS thread,
the C API user first makes sure that interpreter has a corresponding
thread state. Then PyThreadState_Swap()
is called like normal
using that thread state. If the thread state for another interpreter
was already current then it gets swapped out like normal and execution
of that interpreter in the OS thread is thus effectively paused until
it is swapped back in.
Once an interpreter is active in the current OS thread like that, the
thread can call any of the C API, such as PyEval_EvalCode()
(i.e. exec()
). This works by using the current thread state as
the runtime context.
The “Main” Interpreter
When a Python process starts, it creates a single interpreter state (the “main” interpreter) with a single thread state for the current OS thread. The Python runtime is then initialized using them.
After initialization, the script or module or REPL is executed using
them. That execution happens in the interpreter’s __main__
module.
When the process finishes running the requested Python code or REPL, in the main OS thread, the Python runtime is finalized in that thread using the main interpreter.
Runtime finalization has only a slight, indirect effect on still-running Python threads, whether in the main interpreter or in subinterpreters. That’s because right away it waits indefinitely for all non-daemon Python threads to finish.
While the C API may be queried, there is no mechanism by which any
Python thread is directly alerted that finalization has begun,
other than perhaps with “atexit” functions that may be been
registered using threading._register_atexit()
.
Any remaining subinterpreters are themselves finalized later, but at that point they aren’t current in any OS threads.
Interpreter Isolation
CPython’s interpreters are intended to be strictly isolated from each
other. That means interpreters never share objects (except in very
specific cases with immortal, immutable builtin objects). Each
interpreter has its own modules (sys.modules
), classes, functions,
and variables. Even where two interpreters define the same class,
each will have its own copy. The same applies to state in C, including
in extension modules. The CPython C API docs explain more.
Notably, there is some process-global state that interpreters will always share, some mutable and some immutable. Sharing immutable state presents few problems, while providing some benefits (mainly performance). However, all shared mutable state requires special management, particularly for thread-safety, some of which the OS takes care of for us.
Mutable:
- file descriptors
- low-level env vars
- process memory (though allocators are isolated)
- the list of interpreters
Immutable:
- builtin types (e.g.
dict
,bytes
) - singletons (e.g.
None
) - underlying static module data (e.g. functions) for builtin/extension/frozen modules
Existing Execution Components
There are a number of existing parts of Python that may help with understanding how code may be run in a subinterpreter.
In CPython, each component is built around one of the following C API functions (or variants):
PyEval_EvalCode()
: run the bytecode interpreter with the given code objectPyRun_String()
: compile +PyEval_EvalCode()
PyRun_File()
: read + compile +PyEval_EvalCode()
PyRun_InteractiveOneObject()
: compile +PyEval_EvalCode()
PyObject_Call()
: callsPyEval_EvalCode()
builtins.exec()
The builtin exec()
may be used to execute Python code. It is
essentially a wrapper around the C API functions PyRun_String()
and PyEval_EvalCode()
.
Here are some relevant characteristics of the builtin exec()
:
- It runs in the current OS thread and pauses whatever
was running there, which resumes when
exec()
finishes. No other OS threads are affected. (To avoid pausing the current Python thread, runexec()
in athreading.Thread
.) - It may start additional threads, which don’t interrupt it.
- It executes against a “globals” namespace (and a “locals”
namespace). At module-level,
exec()
defaults to using__dict__
of the current module (i.e.globals()
).exec()
uses that namespace as-is and does not clear it before or after. - It propagates any uncaught exception from the code it ran.
The exception is raised from the
exec()
call in the Python thread that originally calledexec()
.
Command-line
The python
CLI provides several ways to run Python code. In each
case it maps to a corresponding C API call:
<no args>
,-i
- run the REPL (PyRun_InteractiveOneObject()
)<filename>
- run a script (PyRun_File()
)-c <code>
- run the given Python code (PyRun_String()
)-m module
- run the module as a script (PyEval_EvalCode()
viarunpy._run_module_as_main()
)
In each case it is essentially a variant of running exec()
at the top-level of the __main__
module of the main interpreter.
threading.Thread
When a Python thread is started, it runs the “target” function
with PyObject_Call()
using a new thread state. The globals
namespace come from func.__globals__
and any uncaught
exception is discarded.
Motivation
The interpreters
module will provide a high-level interface to the
multiple interpreter functionality. The goal is to make the existing
multiple-interpreters feature of CPython more easily accessible to
Python code. This is particularly relevant now that CPython has a
per-interpreter GIL (PEP 684) and people are more interested
in using multiple interpreters.
Without a stdlib module, users are limited to the C API, which restricts how much they can try out and take advantage of multiple interpreters.
The module will include a basic mechanism for communicating between interpreters. Without one, multiple interpreters are a much less useful feature.
Specification
The module will:
- expose the existing multiple interpreter support
- introduce a basic mechanism for communicating between interpreters
The module will wrap a new low-level _interpreters
module
(in the same way as the threading
module).
However, that low-level API is not intended for public use
and thus not part of this proposal.
Using Interpreters
The module defines the following functions:
get_current() -> Interpreter
- Returns the
Interpreter
object for the currently executing interpreter.
list_all() -> list[Interpreter]
- Returns the
Interpreter
object for each existing interpreter, whether it is currently running in any OS threads or not.
create() -> Interpreter
- Create a new interpreter and return the
Interpreter
object for it. The interpreter doesn’t do anything on its own and is not inherently tied to any OS thread. That only happens when something is actually run in the interpreter (e.g.Interpreter.exec()
), and only while running. The interpreter may or may not have thread states ready to use, but that is strictly an internal implementation detail.
Interpreter Objects
An interpreters.Interpreter
object that represents the interpreter
(PyInterpreterState
) with the corresponding unique ID.
There will only be one object for any given interpreter.
If the interpreter was created with interpreters.create()
then
it will be destroyed as soon as all Interpreter
objects with its ID
(across all interpreters) have been deleted.
Interpreter
objects may represent other interpreters than those
created by interpreters.create()
. Examples include the main
interpreter (created by Python’s runtime initialization) and those
created via the C-API, using Py_NewInterpreter()
. Such
Interpreter
objects will not be able to interact with their
corresponding interpreters, e.g. via Interpreter.exec()
(though we may relax this in the future).
Attributes and methods:
id
- (read-only) A non-negative
int
that identifies the interpreter that thisInterpreter
instance represents. Conceptually, this is similar to a process ID.
__hash__()
- Returns the hash of the interpreter’s
id
. This is the same as the hash of the ID’s integer value.
is_running() -> bool
- Returns
True
if the interpreter is currently executing code in its__main__
module. This excludes sub-threads.It refers only to if there is an OS thread running a script (code) in the interpreter’s
__main__
module. That basically means whether or notInterpreter.exec()
is running in some OS thread. Code running in sub-threads is ignored.
prepare_main(**kwargs)
- Bind one or more objects in the interpreter’s
__main__
module.The keyword argument names will be used as the attribute names. The values will be bound as new objects, though exactly equivalent to the original. Only objects specifically supported for passing between interpreters are allowed. See Shareable Objects.
prepare_main()
is helpful for initializing the globals for an interpreter before running code in it.
exec(code, /)
- Execute the given source code in the interpreter
(in the current OS thread), using its
__main__
module. It doesn’t return anything.This is essentially equivalent to switching to this interpreter in the current OS thread and then calling the builtin
exec()
using this interpreter’s__main__
module’s__dict__
as the globals and locals.The code running in the current OS thread (a different interpreter) is effectively paused until
Interpreter.exec()
finishes. To avoid pausing it, create a newthreading.Thread
and callInterpreter.exec()
in it (likeInterpreter.call_in_thread()
does).Interpreter.exec()
does not reset the interpreter’s state nor the__main__
module, neither before nor after, so each successive call picks up where the last one left off. This can be useful for running some code to initialize an interpreter (e.g. with imports) before later performing some repeated task.If there is an uncaught exception, it will be propagated into the calling interpreter as an
ExecutionFailed
. The full error display of the original exception, generated relative to the called interpreter, is preserved on the propagatedExecutionFailed
. That includes the full traceback, with all the extra info like syntax error details and chained exceptions. If theExecutionFailed
is not caught then that full error display will be shown, much like it would be if the propagated exception had been raised in the main interpreter and uncaught. Having the full traceback is particularly useful when debugging.If exception propagation is not desired then an explicit try-except should be used around the code passed to
Interpreter.exec()
. Likewise any error handling that depends on specific information from the exception must use an explicit try-except around the given code, sinceExecutionFailed
will not preserve that information.
call(callable, /)
- Call the callable object in the interpreter.
The return value is discarded. If the callable raises an exception
then it gets propagated as an
ExecutionFailed
exception, in the same way asInterpreter.exec()
.For now only plain functions are supported and only ones that take no arguments and have no cell vars. Free globals are resolved against the target interpreter’s
__main__
module.In the future, we can add support for arguments, closures, and a broader variety of callables, at least partly via pickle. We can also consider not discarding the return value. The initial restrictions are in place to allow us to get the basic functionality of the module out to users sooner.
call_in_thread(callable, /) -> threading.Thread
- Essentially, apply
Interpreter.call()
in a new thread. Return values are discarded and exceptions are not propagated.call_in_thread()
is roughly equivalent to:def task(): interp.call(func) t = threading.Thread(target=task) t.start()
close()
- Destroy the underlying interpreter.
Communicating Between Interpreters
The module introduces a basic communication mechanism through special queues.
There are interpreters.Queue
objects, but they only proxy
the actual data structure: an unbounded FIFO queue that exists
outside any one interpreter. These queues have special accommodations
for safely passing object data between interpreters, without violating
interpreter isolation. This includes thread-safety.
As with other queues in Python, for each “put” the object is added to the back and each “get” pops the next one off the front. Every added object will be popped off in the order it was pushed on.
Only objects that are specifically supported for passing
between interpreters may be sent through an interpreters.Queue
.
Note that the actual objects aren’t sent, but rather their
underlying data. However, the popped object will still be
strictly equivalent to the original.
See Shareable Objects.
The module defines the following functions:
create_queue(maxsize=0, *, syncobj=False) -> Queue
- Create a new queue. If the maxsize is zero or negative then the
queue is unbounded.
“syncobj” is used as the default for
put()
andput_nowait()
.
Queue Objects
interpreters.Queue
objects act as proxies for the underlying
cross-interpreter-safe queues exposed by the interpreters
module.
Each Queue
object represents the queue with the corresponding
unique ID.
There will only be one object for any given queue.
Queue
implements all the methods of queue.Queue
except for
task_done()
and join()
, hence it is similar to
asyncio.Queue
and multiprocessing.Queue
.
Attributes and methods:
id
- (read-only) A non-negative
int
that identifies the corresponding cross-interpreter queue. Conceptually, this is similar to the file descriptor used for a pipe.
maxsize
- (read-only) Number of items allowed in the queue. Zero means “unbounded”.
__hash__()
- Return the hash of the queue’s
id
. This is the same as the hash of the ID’s integer value.
empty()
- Return
True
if the queue is empty,False
otherwise.This is only a snapshot of the state at the time of the call. Other threads or interpreters may cause this to change.
full()
- Return
True
if there aremaxsize
items in the queue.If the queue was initialized with
maxsize=0
(the default), thenfull()
never returnsTrue
.This is only a snapshot of the state at the time of the call. Other threads or interpreters may cause this to change.
qsize()
- Return the number of items in the queue.
This is only a snapshot of the state at the time of the call. Other threads or interpreters may cause this to change.
put(obj, timeout=None, *, syncobj=None)
- Add the object to the queue.
If
maxsize > 0
and the queue is full then this blocks until a free slot is available. If timeout is a positive number then it only blocks at least that many seconds and then raisesinterpreters.QueueFull
. Otherwise is blocks forever.If “syncobj” is true then the object must be shareable, which means the object’s data is passed through rather than the object itself. If “syncobj” is false then all objects are supported. However, there are some performance penalties and all objects are copies (e.g. via pickle). Thus mutable objects will never be automatically synchronized between interpreters. If “syncobj” is None (the default) then the queue’s default value is used.
If an object is still in the queue, and the interpreter which put it in the queue (i.e. to which it belongs) is destroyed, then the object is immediately removed from the queue. (We may later add an option to replace the removed object in the queue with a sentinel or to raise an exception for the corresponding
get()
call.)
put_nowait(obj, *, syncobj=None)
- Like
put()
but effectively with an immediate timeout. Thus if the queue is full, it immediately raisesinterpreters.QueueFull
.
get(timeout=None) -> object
- Pop the next object from the queue and return it. Block while
the queue is empty. If a positive timeout is provided and an
object hasn’t been added to the queue in that many seconds
then raise
interpreters.QueueEmpty
.
get_nowait() -> object
- Like
get()
, but do not block. If the queue is not empty then return the next item. Otherwise, raiseinterpreters.QueueEmpty
.
Synchronization
There are situations where two interpreters should be synchronized. That may involve sharing a resource, worker management, or preserving sequential consistency.
In threaded programming the typical synchronization primitives are
types like mutexes. The threading
module exposes several.
However, interpreters cannot share objects which means they cannot
share threading.Lock
objects.
The interpreters
module does not provide any such dedicated
synchronization primitives. Instead, interpreters.Queue
objects provide everything one might need.
For example, if there’s a shared resource that needs managed access then a queue may be used to manage it, where the interpreters pass an object around to indicate who can use the resource:
import interpreters
from mymodule import load_big_data, check_data
numworkers = 10
control = interpreters.create_queue()
data = memoryview(load_big_data())
def worker():
interp = interpreters.create()
interp.prepare_main(control=control, data=data)
interp.exec("""if True:
from mymodule import edit_data
while True:
token = control.get()
edit_data(data)
control.put(token)
""")
threads = [threading.Thread(target=worker) for _ in range(numworkers)]
for t in threads:
t.start()
token = 'football'
control.put(token)
while True:
control.get()
if not check_data(data):
break
control.put(token)
Exceptions
InterpreterError
- Indicates that some interpreter-related failure occurred.
This exception is a subclass of
Exception
.
InterpreterNotFoundError
- Raised from
Interpreter
methods after the underlying interpreter has been destroyed, e.g. via the C-API.This exception is a subclass of
InterpreterError
.
ExecutionFailed
- Raised from
Interpreter.exec()
andInterpreter.call()
when there’s an uncaught exception. The error display for this exception includes the traceback of the uncaught exception, which gets shown after the normal error display, much like happens forExceptionGroup
.Attributes:
type
- a representation of the original exception’s class, with__name__
,__module__
, and__qualname__
attrs.msg
-str(exc)
of the original exceptionsnapshot
- atraceback.TracebackException
object for the original exception
This exception is a subclass of
InterpreterError
.
QueueError
- Indicates that some queue-related failure occurred.
This exception is a subclass of
Exception
.
QueueNotFoundError
- Raised from
interpreters.Queue
methods after the underlying queue has been destroyed.This exception is a subclass of
QueueError
.
QueueEmpty
- Raised from
Queue.get()
(orget_nowait()
with no default) when the queue is empty.This exception is a subclass of both
QueueError
and the stdlibqueue.Empty
.
QueueFull
- Raised from
Queue.put()
(with a timeout) orput_nowait()
when the queue is already at its max size.This exception is a subclass of both
QueueError
and the stdlibqueue.Empty
.
InterpreterPoolExecutor
Along with the new interpreters
module, there will be a new
concurrent.futures.InterpreterPoolExecutor
. It will be a
derivative of ThreadPoolExecutor
, where each worker executes
in its own thread, but each with its own subinterpreter.
Like the other executors, InterpreterPoolExecutor
will support
callables for tasks, and for the initializer. Also like the other
executors, the arguments in both cases will be mostly unrestricted.
The callables and arguments will typically be serialized when sent
to a worker’s interpreter, e.g. with pickle, like how the
ProcessPoolExecutor
works. This contrasts with
Interpreter.call()
, which will (at least initially)
be much more restricted.
Communication between workers, or between the executor
(or generally its interpreter) and the workers, may still be done
through interpreters.Queue
objects, set with the initializer.
sys.implementation.supports_isolated_interpreters
Python implementations are not required to support subinterpreters,
though most major ones do. If an implementation does support them
then sys.implementation.supports_isolated_interpreters
will be
set to True
. Otherwise it will be False
. If the feature
is not supported then importing the interpreters
module will
raise an ImportError
.
Examples
The following examples demonstrate practical cases where multiple interpreters may be useful.
Example 1:
There’s a stream of requests coming in that will be handled via workers in sub-threads.
- each worker thread has its own interpreter
- there’s one queue to send tasks to workers and another queue to return results
- the results are handled in a dedicated thread
- each worker keeps going until it receives a “stop” sentinel (
None
) - the results handler keeps going until all workers have stopped
import interpreters
from mymodule import iter_requests, handle_result
tasks = interpreters.create_queue()
results = interpreters.create_queue()
numworkers = 20
threads = []
def results_handler():
running = numworkers
while running:
try:
res = results.get(timeout=0.1)
except interpreters.QueueEmpty:
# No workers have finished a request since last time.
pass
else:
if res is None:
# A worker has stopped.
running -= 1
else:
handle_result(res)
empty = object()
assert results.get_nowait(empty) is empty
threads.append(threading.Thread(target=results_handler))
def worker():
interp = interpreters.create()
interp.prepare_main(tasks=tasks, results=results)
interp.exec("""if True:
from mymodule import handle_request, capture_exception
while True:
req = tasks.get()
if req is None:
# Stop!
break
try:
res = handle_request(req)
except Exception as exc:
res = capture_exception(exc)
results.put(res)
# Notify the results handler.
results.put(None)
""")
threads.extend(threading.Thread(target=worker) for _ in range(numworkers))
for t in threads:
t.start()
for req in iter_requests():
tasks.put(req)
# Send the "stop" signal.
for _ in range(numworkers):
tasks.put(None)
for t in threads:
t.join()
Example 2:
This case is similar to the last as there are a bunch of workers
in sub-threads. However, this time the code is chunking up a big array
of data, where each worker processes one chunk at a time. Copying
that data to each interpreter would be exceptionally inefficient,
so the code takes advantage of directly sharing memoryview
buffers.
- all the interpreters share the buffer of the source array
- each one writes its results to a second shared buffer
- there’s use a queue to send tasks to workers
- only one worker will ever read any given index in the source array
- only one worker will ever write to any given index in the results (this is how it ensures thread-safety)
import interpreters
import queue
from mymodule import read_large_data_set, use_results
numworkers = 3
data, chunksize = read_large_data_set()
buf = memoryview(data)
numchunks = (len(buf) + 1) / chunksize
results = memoryview(b'\0' * numchunks)
tasks = interpreters.create_queue()
def worker(id):
interp = interpreters.create()
interp.prepare_main(data=buf, results=results, tasks=tasks)
interp.exec("""if True:
from mymodule import reduce_chunk
while True:
req = tasks.get()
if res is None:
# Stop!
break
resindex, start, end = req
chunk = data[start: end]
res = reduce_chunk(chunk)
results[resindex] = res
""")
threads = [threading.Thread(target=worker) for _ in range(numworkers)]
for t in threads:
t.start()
for i in range(numchunks):
# Assume there's at least one worker running still.
start = i * chunksize
end = start + chunksize
if end > len(buf):
end = len(buf)
tasks.put((start, end, i))
# Send the "stop" signal.
for _ in range(numworkers):
tasks.put(None)
for t in threads:
t.join()
use_results(results)
Rationale
A Minimal API
Since the core dev team has no real experience with how users will make use of multiple interpreters in Python code, this proposal purposefully keeps the initial API as lean and minimal as possible. The objective is to provide a well-considered foundation on which further (more advanced) functionality may be added later, as appropriate.
That said, the proposed design incorporates lessons learned from existing use of subinterpreters by the community, from existing stdlib modules, and from other programming languages. It also factors in experience from using subinterpreters in the CPython test suite and using them in concurrency benchmarks.
create(), create_queue()
Typically, users call a type to create instances of the type, at which
point the object’s resources get provisioned. The interpreters
module takes a different approach, where users must call create()
to get a new interpreter or create_queue()
for a new queue.
Calling interpreters.Interpreter()
directly only returns a wrapper
around an existing interpreters (likewise for
interpreters.Queue()
).
This is because interpreters (and queues) are special resources.
They exist globally in the process and are not managed/owned by the
current interpreter. Thus the interpreters
module makes creating
an interpreter (or queue) a visibly distinct operation from creating
an instance of interpreters.Interpreter
(or interpreters.Queue
).
Interpreter.prepare_main() Sets Multiple Variables
prepare_main()
may be seen as a setter function of sorts.
It supports setting multiple names at once,
e.g. interp.prepare_main(spam=1, eggs=2)
, whereas most setters
set one item at a time. The main reason is for efficiency.
To set a value in the interpreter’s __main__.__dict__
, the
implementation must first switch the OS thread to the identified
interpreter, which involves some non-negligible overhead. After
setting the value it must switch back.
Furthermore, there is some additional overhead to the mechanism
by which it passes objects between interpreters, which can be
reduced in aggregate if multiple values are set at once.
Therefore, prepare_main()
supports setting multiple
values at once.
Propagating Exceptions
An uncaught exception from a subinterpreter,
via Interpreter.exec()
,
could either be (effectively) ignored,
like threading.Thread()
does,
or propagated, like the builtin exec()
does.
Since Interpreter.exec()
is a synchronous operation,
like the builtin exec()
, uncaught exceptions are propagated.
However, such exceptions are not raised directly. That’s because
interpreters are isolated from each other and must not share objects,
including exceptions. That could be addressed by raising a surrogate
of the exception, whether a summary, a copy, or a proxy that wraps it.
Any of those could preserve the traceback, which is useful for
debugging. The ExecutionFailed
that gets raised
is such a surrogate.
There’s another concern to consider. If a propagated exception isn’t
immediately caught, it will bubble up through the call stack until
caught (or not). In the case that code somewhere else may catch it,
it is helpful to identify that the exception came from a subinterpreter
(i.e. a “remote” source), rather than from the current interpreter.
That’s why Interpreter.exec()
raises ExecutionFailed
and why
it is a plain Exception
, rather than a copy or proxy with a class
that matches the original exception. For example, an uncaught
ValueError
from a subinterpreter would never get caught in a later
try: ... except ValueError: ...
. Instead, ExecutionFailed
must be handled directly.
In contrast, exceptions propagated from Interpreter.call()
do not
involve ExecutionFailed
but are raised directly, as though originating
in the calling interpreter. This is because Interpreter.call()
is
a higher level method that uses pickle to support objects that can’t
normally be passed between interpreters.
Limited Object Sharing
As noted in Interpreter Isolation, only a small number of builtin objects may be truly shared between interpreters. In all other cases objects can only be shared indirectly, through copies or proxies.
The set of objects that are shareable as copies through queues
(and Interpreter.prepare_main()
) is limited for the sake of
efficiency.
Supporting sharing of all objects is possible (via pickle) but not part of this proposal. For one thing, it’s helpful to know in those cases that only an efficient implementation is being used. Furthermore, in those cases supporting mutable objects via pickling would violate the guarantee that “shared” objects be equivalent (and stay that way).
Objects vs. ID Proxies
For both interpreters and queues, the low-level module makes use of
proxy objects that expose the underlying state by their corresponding
process-global IDs. In both cases the state is likewise process-global
and will be used by multiple interpreters. Thus they aren’t suitable
to be implemented as PyObject
, which is only really an option for
interpreter-specific data. That’s why the interpreters
module
instead provides objects that are weakly associated through the ID.
Rejected Ideas
See PEP 554.
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Source: https://github.com/python/peps/blob/main/peps/pep-0734.rst
Last modified: 2024-04-10 21:49:06 GMT