Following system colour scheme Selected dark colour scheme Selected light colour scheme

Python Enhancement Proposals

PEP 789 – Preventing task-cancellation bugs by limiting yield in async generators

Author:
Zac Hatfield-Dodds <zac at zhd.dev>, Nathaniel J. Smith <njs at pobox.com>
PEP-Delegate:

Discussions-To:
Discourse thread
Status:
Draft
Type:
Standards Track
Created:
14-May-2024
Python-Version:
3.14

Table of Contents

Abstract

Structured concurrency is increasingly popular in Python. Interfaces such as the asyncio.TaskGroup and asyncio.timeout context managers support compositional reasoning, and allow developers to clearly scope the lifetimes of concurrent tasks. However, using yield to suspend a frame inside such a context leads to situations where the wrong task is canceled, timeouts are ignored, and exceptions are mishandled. More fundamentally, suspending a frame inside a TaskGroup violates the structured concurrency design principle that child tasks are encapsulated within their parent frame.

To address these issues, this PEP proposes a new sys.prevent_yields() context manager. When syntactically inside this context, attempting to yield will raise a RuntimeError, preventing the task from yielding. Additionally, a mechanism will be provided for decorators such as @contextmanager to allow yields inside the decorated function. sys.prevent_yields() will be used by asyncio and downstream libraries to implement task groups, timeouts, and cancellation; and a related mechanism by contextlib etc. to convert generators into context managers which allow safe yields.

Background

Structured concurrency is increasingly popular in Python, in the form of newer asyncio interfaces and third-party libraries such as Trio and anyio. These interfaces support compositional reasoning, so long as users never write a yield which suspends a frame while inside a cancel scope.

A cancel scope is a context manager which can… cancel… whatever work occurs within that context (…scope). In asyncio, this is implicit in the design of with asyncio.timeout(): or async with asyncio.TaskGroup() as tg:, which respectively cancel the contained work after the specified duration, or cancel sibling tasks when one of them raises an exception. The core functionality of a cancel scope is synchronous, but the user-facing context managers may be either sync or async. [1] [2]

This structured approach works beautifully, unless you hit one specific sharp edge: breaking the nesting structure by yielding inside a cancel scope. This has much the same effect on structured control flow as adding just a few cross-function gotos, and the effects are truly dire:

  • The wrong task can be canceled, whether due to a timeout, an error in a sibling task, or an explicit request to cancel some other task
  • Exceptions, including CancelledError, can be delivered to the wrong task
  • Exceptions can go missing entirely, being dropped instead of added to an ExceptionGroup

Problem statement

Here’s the fundamental issue: yield suspends a call frame. It only makes sense to yield in a leaf frame – i.e., if your call stack goes like A -> B -> C, then you can suspend C, but you can’t suspend B while leaving C running.

But, TaskGroup is a kind of “concurrent call” primitive, where a single frame can have multiple child frames that run concurrently. This means that if we allow people to mix yield and TaskGroup, then we can end up in exactly this situation, where B gets suspended but C is actively running. This is nonsensical, and causes serious practical problems (e.g., if C raises an exception and A has returned, we have no way to propagate it).

This is a fundamental incompatibility between generator control flow and structured concurrency control flow, not something we can fix by tweaking our APIs. The only solution seems to be to forbid yield inside a TaskGroup.

Although timeouts don’t leave a child task running, the close analogy and related problems lead us to conclude that yield should be forbidden inside all cancel scopes, not only TaskGroups. See Can’t we just deliver exceptions to the right place? for discussion.

Motivating examples

Let’s consider three examples, to see what this might look like in practice.

Leaking a timeout to the outer scope

Suppose that we want to iterate over an async iterator, but wait for at most max_time seconds for each element. We might naturally encapsulate the logic for doing so in an async generator, so that the call site can continue to use a straightforward async for loop:

async def iter_with_timeout(ait, max_time):
    try:
        while True:
            with timeout(max_time):
                yield await anext(ait)
    except StopAsyncIteration:
        return

async def fn():
    async for elem in iter_with_timeout(ait, max_time=1.0):
        await do_something_with(elem)

Unfortunately, there’s a bug in this version: the timeout might expire after the generator yields but before it is resumed! In this case, we’ll see a CancelledError raised in the outer task, where it cannot be caught by the with timeout(max_time): statement.

The fix is fairly simple: get the next element inside the timeout context, and then yield outside that context.

async def correct_iter_with_timeout(ait, max_time):
    try:
        while True:
            with timeout(max_time):
                tmp = await anext(ait)
            yield tmp
    except StopAsyncIteration:
        return

Leaking background tasks (breaks cancellation and exception handling)

Timeouts are not the only interface which wrap a cancel scope - and if you need some background worker tasks, you can’t simply close the TaskGroup before yielding.

As an example, let’s look at a fan-in generator, which we’ll use to merge the feeds from several “sensors”. We’ll also set up our mock sensors with a small buffer, so that we’ll raise an error in the background task while control flow is outside the combined_iterators generator.

import asyncio, itertools

async def mock_sensor(name):
    for n in itertools.count():
        await asyncio.sleep(0.1)
        if n == 1 and name == "b":  # 'presence detection'
            yield "PRESENT"
        elif n == 3 and name == "a":  # inject a simple bug
            print("oops, raising RuntimeError")
            raise RuntimeError
        else:
            yield f"{name}-{n}"  # non-presence sensor data

async def move_elements_to_queue(ait, queue):
    async for obj in ait:
        await queue.put(obj)

async def combined_iterators(*aits):
    """Combine async iterators by starting N tasks, each of
    which move elements from one iterable to a shared queue."""
    q = asyncio.Queue(maxsize=2)
    async with asyncio.TaskGroup() as tg:
        for ait in aits:
            tg.create_task(move_elements_to_queue(ait, q))
        while True:
            yield await q.get()

async def turn_on_lights_when_someone_gets_home():
    combined = combined_iterators(mock_sensor("a"), mock_sensor("b"))
    async for event in combined:
        print(event)
        if event == "PRESENT":
            break
    print("main task sleeping for a bit")
    await asyncio.sleep(1)  # do some other operation

asyncio.run(turn_on_lights_when_someone_gets_home())

When we run this code, we see the expected sequence of observations, then a ‘detection’, and then while the main task is sleeping we trigger that RuntimeError in the background. But… we don’t actually observe the RuntimeError, not even as the __context__ of another exception!

>> python3.11 demo.py
a-0
b-0
a-1
PRESENT
main task sleeping for a bit
oops, raising RuntimeError

Traceback (most recent call last):
  File "demo.py", line 39, in <module>
    asyncio.run(turn_on_lights_when_someone_gets_home())
  ...
  File "demo.py", line 37, in turn_on_lights_when_someone_gets_home
    await asyncio.sleep(1)  # do some other operation
  File ".../python3.11/asyncio/tasks.py", line 649, in sleep
    return await future
asyncio.exceptions.CancelledError

Here, again, the problem is that we’ve yielded inside a cancel scope; this time the scope which a TaskGroup uses to cancel sibling tasks when one of the child tasks raises an exception. However, the CancelledError which was intended for the sibling task was instead injected into the outer task, and so we never got a chance to create and raise an ExceptionGroup(..., [RuntimeError()]).

To fix this, we need to turn our async generator into an async context manager, which yields an async iterable - in this case a generator wrapping the queue; in future perhaps the queue itself:

async def queue_as_aiterable(queue):
    # async generators that don't `yield` inside a cancel scope are fine!
    while True:
        try:
            yield await queue.get()
        except asyncio.QueueShutDown:
            return

@asynccontextmanager  #  yield-in-cancel-scope is OK in a context manager
async def combined_iterators(*aits):
    q = asyncio.Queue(maxsize=2)
    async with asyncio.TaskGroup() as tg:
        for ait in aits:
            tg.create_task(move_elements_to_queue(ait, q))
        yield queue_as_aiterable(q)

async def turn_on_lights_when_someone_gets_home():
    ...
    async with combined_iterators(...) as ait:
        async for event in ait:
            ...

In a user-defined context manager

Yielding inside a cancel scope can be safe, if and only if you’re using the generator to implement a context manager [3] - in this case any propagating exceptions will be redirected to the expected task.

We’ve also implemented the ASYNC101 linter rule in flake8-async, which warns against yielding in known cancel scopes. Could user education be sufficient to avoid these problems? Unfortunately not: user-defined context managers can also wrap a cancel scope, and it’s infeasible to recognize or lint for all such cases.

This regularly arises in practice, because ‘run some background tasks for the duration of this context’ is a very common pattern in structured concurrency. We saw that in combined_iterators() above; and have seen this bug in multiple implementations of the websocket protocol:

async def get_messages(websocket_url):
    # The websocket protocol requires background tasks to manage the socket heartbeat
    async with open_websocket(websocket_url) as ws:  # contains a TaskGroup!
        while True:
            yield await ws.get_message()

async with open_websocket(websocket_url) as ws:
    async for message in get_messages(ws):
        ...

Specification

To prevent these problems, we propose:

  1. a new context manager, with sys.prevent_yields(reason): ... which will raise a RuntimeError if you attempt to yield while inside it. [4] Cancel-scope-like context managers in asyncio and downstream code can then wrap this to prevent yielding inside their with-block.
  2. a mechanism by which generator-to-context-manager decorators can allow yields across one call. We’re not yet sure what this should look like; the leading candidates are:
    1. a code-object attribute, fn.__code__.co_allow_yields = True, or
    2. some sort of invocation flag, e.g. fn.__invoke_with_yields__, to avoid mutating a code object that might be shared between decorated and undecorated functions

Implementation - tracking frames

The new sys.prevent_yields context manager will require interpreter support. For each frame, we track the entries and exits of this context manager.

We’re not particularly attached to the exact representation; we’ll discuss it as a stack (which would support clear error messages), but more compact representations such as pair-of-integers would also work.

  • When entering a newly-created or resumed frame, initialize empty stacks of entries and exits.
  • When returning from a frame, merge these stacks into that of the parent frame.
  • When yielding:
    • if entries != [] and not frame.allow_yield_flag, raise a RuntimeError instead of yielding (the new behavior this PEP proposes)
    • otherwise, merge stacks into the parent frame as for a return.

Because this is about yielding frames within a task, not switching between tasks, syntactic yield and yield from should be affected, but await expressions should not.

We can reduce the overhead by storing this metadata in a single stack per thread for all stack frames which are not generators.

Worked examples

No-yield example

In this example, we see multiple rounds of the stack merging as we unwind from sys.prevent_yields, through the user-defined ContextManager, back to the original Frame. For brevity, the reason for preventing yields is not shown; it is part of the “1 enter” state.

../_images/pep-789-example-no-yield.png

With no yield we don’t raise any errors, and because the number of enters and exits balance the frame returns as usual with no further tracking.

Attempts-to-yield example

In this example, the Frame attempts to yield while inside the sys.prevent_yields context. This is detected by the interpreter, which raises a RuntimeError instead of suspending the frame.

../_images/pep-789-example-yield-errors.png

Allowed-to-yield example

In this example, a decorator has marked the Frame as allowing yields. This could be @contextlib.contextmanager or a related decorator.

../_images/pep-789-example-yield-allowed.png

When the Frame is allowed to yield, the entry/exit stack is merged into the parent frame’s stack before suspending. When the Frame resumes, its stack is empty. Finally, when the Frame exits, the exit is merged into the parent frame’s stack, rebalancing it.

This ensures that the parent frame correctly inherits any remaining sys.prevent_yields state, while allowing the Frame to safely suspend and resume.

Allowing yield for context managers

TODO: this section is a placeholder, pending a decision on the mechanism for ``@contextmanager`` to re-enable yields in the wrapped function.

  • Explain and show a code sample of how @asynccontextmanager sets the flag

Note that third-party decorators such as @pytest.fixture demonstrate that we can’t just have the interpreter special-case contextlib.

Behavior if sys.prevent_yields is misused

While unwise, it’s possible to call sys.prevent_yields.__enter__ and .__exit__ in an order that does not correspond to any valid nesting, or get an invalid frame state in some other way.

There are two ways sys.prevent_yields.__exit__ could detect an invalid state. First, if yields are not prevented, we can simply raise an exception without changing the state. Second, if an unexpected entry is at the top of the stack, we suggest popping that entry and raising an exception – this ensures that out-of-order calls will still clear the stack, while still making it clear that something is wrong.

(and if we choose e.g. an integer- rather than stack-based representation, such states may not be distinguishable from correct nesting at all, in which case the question will not arise)

Anticipated uses

In the standard library, sys.prevent_yields could be used by asyncio.TaskGroup, asyncio.timeout, and asyncio.timeout_at. Downstream, we expect to use it in trio.CancelScope, async fixtures (in pytest-trio, anyio, etc.), and perhaps other places.

We consider use-cases unrelated to async correctness, such as preventing decimal.localcontext from leaking out of a generator, out of scope for this PEP.

The generator-to-context-manager support would be used by @contextlib.(async)contextmanager, and if necessary in (Async)ExitStack.

Backwards Compatibility

The addition of the sys.prevent_yields context manager, changes to @contextlib.(async)contextmanager, and corresponding interpreter support are all fully backwards-compatible.

Preventing yields inside asyncio.TaskGroup, asycio.timeout, and asyncio.timeout_at would be a breaking change to at least some code in the wild, which (however unsafe and prone to the motivating problems above) may work often enough to make it into production.

We will seek community feedback on appropriate deprecation pathways for standard-library code, including the suggested length of any deprecation period. As an initial suggestion, we could make suspending stdlib contexts emit a DeprecationWarning only under asyncio debug mode in 3.14; then transition to warn-by-default and error under debug mode in 3.15; and finally a hard error in 3.16.

Irrespective of stdlib usage, downstream frameworks would adopt this functionality immediately.

How widespread is this bug?

We don’t have solid numbers here, but believe that many projects are affected in the wild. Since hitting a moderate and a critical bug attributed to suspending a cancel scope in the same week at work, we’ve used static analysis with some success. Three people Zac spoke to at PyCon recognized the symptoms and concluded that they had likely been affected.

TODO: run the ASYNC101 lint rule across ecosystem projects, e.g. the aio-libs packages, and get some sense of frequency in widely-used PyPI packages? This would help inform the break/deprecation pathways for stdlib code.

How to Teach This

Async generators are very rarely taught to novice programmers.

Most intermediate and advanced Python programmers will only interact with this PEP as users of TaskGroup, timeout, and @contextmanager. For this group, we expect a clear exception message and documentation to be sufficient.

  • A new section will be added to the developing with asyncio page, which briefly states that async generators are not permitted to yield when inside a “cancel scope” context, i.e. TaskGroup or timeout context manager. We anticipate that the problem-restatement and some parts of the motivation section will provide a basis for these docs.
    • When working in codebases which avoid async generators entirely [5], we’ve found that an async context manager yielding an async iterable is a safe and ergonomic replacement for async generators – and avoids the delayed-cleanup problems described in PEP 533, which this proposal does not address.
  • In the docs for each context manager which wraps a cancel scope, and thus now sys.prevent_yields, include a standard sentence such as “If used within an async generator, [it is an error to yield inside this context manager].” with a hyperlink to the explanation above.

For asyncio, Trio, curio, or other-framework maintainers who implement cancel scope semantics, we will ensure that the documentation of sys.prevent_yields gives a full explanation distilled from the solution and implementation sections of this PEP. We anticipate consulting most such maintainers for their feedback on the draft PEP.

Rejected alternatives

PEP 533, deterministic cleanup for iterators

PEP 533 proposes adding __[a]iterclose__ to the iterator protocol, essentially wrapping a with [a]closing(ait) around each (async) for loop. While this would be useful for ensuring timely and deterministic cleanup of resources held by iterators, the problem it aims to solve, it does not fully address the issues that motivate this PEP.

Even with PEP 533, misfired cancellations would still be delivered to the wrong task and could wreak havoc before the iterator is closed. Moreover, it does not address the fundamental structured concurrency problem with TaskGroup, where suspending a frame that owns a TaskGroup is incompatible with the model of child tasks being fully encapsulated within their parent frame.

Deprecate async generators entirely

At the 2024 language summit, several attendees suggested instead deprecating async generators in toto. Unfortunately, while the common-in-practice cases all use async generators, Trio code can trigger the same problem with standard generators:

# We use Trio for this example, because while `asyncio.timeout()` is async,
# Trio's CancelScope type and timeout context managers are synchronous.
import trio

def abandon_each_iteration_after(max_seconds):
    # This is of course broken, but I can imagine someone trying it...
    while True:
        with trio.move_on_after(max_seconds):
            yield

@trio.run
async def main():
    for _ in abandon_each_iteration_after(max_seconds=1):
        await trio.sleep(3)

If it wasn’t for the bug in question, this code would look pretty idiomatic - but after about a second, instead of moving on to the next iteration it raises:

Traceback (most recent call last):
  File "demo.py", line 10, in <module>
    async def main():
  File "trio/_core/_run.py", line 2297, in run
    raise runner.main_task_outcome.error
  File "demo.py", line 12, in main
    await trio.sleep(3)
  File "trio/_timeouts.py", line 87, in sleep
    await sleep_until(trio.current_time() + seconds)
  ...
  File "trio/_core/_run.py", line 1450, in raise_cancel
    raise Cancelled._create()
trio.Cancelled: Cancelled

Furthermore, there are some non-cancel-scope synchronous context managers which exhibit related problems, such as the abovementioned decimal.localcontext. While fixing the example below is not a goal of this PEP, it demonstrates that yield-within-with problems are not exclusive to async generators:

import decimal

def why_would_you_do_this():
    with decimal.localcontext(decimal.Context(prec=1)):
        yield

one = decimal.Decimal(1)
print(one / 3)  # 0.3333333333333333333333333333
next(gen := why_would_you_do_this())
print(one / 3)  # 0.3

While I’ve had good experiences in async Python without async generators [5], I’d prefer to fix the problem than remove them from the language.

Can’t we just deliver exceptions to the right place?

If we implemented PEP 568 (Generator-sensitivity for Context Variables; see also PEP 550), it would be possible to handle exceptions from timeouts: the event loop could avoid firing a CancelledError until the generator frame which contains the context manager is on the stack - either when the generator is resumed, or when it is finalized.

This can take arbitrarily long; even if we implemented PEP 533 to ensure timely cleanup on exiting (async) for-loops it’s still possible to drive a generator manually with next/send.

However, this doesn’t address the other problem with TaskGroup. The model for generators is that you put a stack frame in suspended animation and can then treat it as an inert value which can be stored, moved around, and maybe discarded or revived in some arbitrary place. The model for structured concurrency is that your stack becomes a tree, with child tasks encapsulated within some parent frame. They’re extending the basic structured programming model in different, and unfortunately incompatible, directions.

Suppose for example that suspending a frame containing an open TaskGroup also suspended all child tasks. This would preserve the ‘downward’ structured concurrency, in that children remain encapsulated - albeit at the cost of deadlocking both of our motivating examples, and much real-world code. However, it would still be possible to resume the generator in a different task, violating the ‘upwards’ invariant of structured concurrency.

We don’t think it’s worth adding this much machinery to handle cancel scopes, while still leaving task groups broken.

Alternative implementation - inspecting bytecode

Jelle Zijlstra has sketched an alternative, where sys.prevent_yields inspects the bytecode of callers until satisfied that there is no yield between the calling instruction pointer and the next context exit. We expect that support for syntatically-nested context managers could be added fairly easily.

However, it’s not yet clear how this would work when user-defined context managers wrap sys.prevent_yields. Worse, this approach ignores explicit calls to __enter__() and __exit__(), meaning that the context management protocol would vary depending on whether the with statement was used.

The ‘only pay if you use it’ performance cost is very attractive. However, inspecting frame objects is prohibitively expensive for core control-flow constructs, and causes whole-program slowdowns via de-optimization. On the other hand, adding interpreter support for better performance leads back to the same pay-regardless semantics as our preferred solution above.

Footnotes


Source: https://github.com/python/peps/blob/main/peps/pep-0789.rst

Last modified: 2024-06-04 01:45:13 GMT