PEP: 669 Title: Low Impact Monitoring for CPython Author: Mark Shannon
<mark@hotpy.org> Discussions-To:
https://discuss.python.org/t/pep-669-low-impact-monitoring-for-cpython/13018/
Status: Final Type: Standards Track Created: 18-Aug-2021 Python-Version:
3.12 Post-History: 07-Dec-2021, 10-Jan-2022, Resolution:
https://discuss.python.org/t/pep-669-low-impact-monitoring-for-cpython/13018/42

python:sys.monitoring

Abstract

Using a profiler or debugger in CPython can have a severe impact on
performance. Slowdowns by an order of magnitude are common.

This PEP proposes an API for monitoring Python programs running on
CPython that will enable monitoring at low cost.

Although this PEP does not specify an implementation, it is expected
that it will be implemented using the quickening step of PEP 659.

A sys.monitoring namespace will be added, which will contain the
relevant functions and constants.

Motivation

Developers should not have to pay an unreasonable cost to use debuggers,
profilers and other similar tools.

C++ and Java developers expect to be able to run a program at full speed
(or very close to it) under a debugger. Python developers should expect
that too.

Rationale

The quickening mechanism provided by PEP 659 provides a way to
dynamically modify executing Python bytecode. These modifications have
little cost beyond the parts of the code that are modified and a
relatively low cost to those parts that are modified. We can leverage
this to provide an efficient mechanism for monitoring that was not
possible in 3.10 or earlier.

By using quickening, we expect that code run under a debugger on 3.12
should outperform code run without a debugger on 3.11. Profiling will
still slow down execution, but by much less than in 3.11.

Specification

Monitoring of Python programs is done by registering callback functions
for events and by activating a set of events.

Activating events and registering callback functions are independent of
each other.

Both registering callbacks and activating events are done on a per-tool
basis. It is possible to have multiple tools that respond to different
sets of events.

Note that, unlike sys.settrace(), events and callbacks are per
interpreter, not per thread.

Events

As a code object executes various events occur that might be of interest
to tools. By activating events and by registering callback functions
tools can respond to these events in any way that suits them. Events can
be set globally, or for individual code objects.

For 3.12, CPython will support the following events:

-   PY_START: Start of a Python function (occurs immediately after the
    call, the callee's frame will be on the stack)
-   PY_RESUME: Resumption of a Python function (for generator and
    coroutine functions), except for throw() calls.
-   PY_THROW: A Python function is resumed by a throw() call.
-   PY_RETURN: Return from a Python function (occurs immediately before
    the return, the callee's frame will be on the stack).
-   PY_YIELD: Yield from a Python function (occurs immediately before
    the yield, the callee's frame will be on the stack).
-   PY_UNWIND: Exit from a Python function during exception unwinding.
-   CALL: A call in Python code (event occurs before the call).
-   C_RETURN: Return from any callable, except Python functions (event
    occurs after the return).
-   C_RAISE: Exception raised from any callable, except Python functions
    (event occurs after the exit).
-   RAISE: An exception is raised, except those that cause a
    STOP_ITERATION event.
-   EXCEPTION_HANDLED: An exception is handled.
-   LINE: An instruction is about to be executed that has a different
    line number from the preceding instruction.
-   INSTRUCTION -- A VM instruction is about to be executed.
-   JUMP -- An unconditional jump in the control flow graph is made.
-   BRANCH -- A conditional branch is taken (or not).
-   STOP_ITERATION -- An artificial StopIteration is raised; see the
    STOP_ITERATION event.

More events may be added in the future.

All events will be attributes of the events namespace in sys.monitoring.
All events will represented by a power of two integer, so that they can
be combined with the | operator.

Events are divided into three groups:

Local events

Local events are associated with normal execution of the program and
happen at clearly defined locations. All local events can be disabled.
The local events are:

-   PY_START
-   PY_RESUME
-   PY_RETURN
-   PY_YIELD
-   CALL
-   LINE
-   INSTRUCTION
-   JUMP
-   BRANCH
-   STOP_ITERATION

Ancilliary events

Ancillary events can be monitored like other events, but are controlled
by another event:

-   C_RAISE
-   C_RETURN

The C_RETURN and C_RAISE events are are controlled by the CALL event.
C_RETURN and C_RAISE events will only be seen if the corresponding CALL
event is being monitored.

Other events

Other events are not necessarily tied to a specific location in the
program and cannot be individually disabled.

The other events that can be monitored are:

-   PY_THROW
-   PY_UNWIND
-   RAISE
-   EXCEPTION_HANDLED

The STOP_ITERATION event

PEP 380 <380#use-of-stopiteration-to-return-values> specifies that a
StopIteration exception is raised when returning a value from a
generator or coroutine. However, this is a very inefficient way to
return a value, so some Python implementations, notably CPython 3.12+,
do not raise an exception unless it would be visible to other code.

To allow tools to monitor for real exceptions without slowing down
generators and coroutines, the STOP_ITERATION event is provided.
STOP_ITERATION can be locally disabled, unlike RAISE.

Tool identifiers

The VM can support up to 6 tools at once. Before registering or
activating events, a tool should choose an identifier. Identifiers are
integers in the range 0 to 5.

    sys.monitoring.use_tool_id(id, name:str) -> None
    sys.monitoring.free_tool_id(id) -> None
    sys.monitoring.get_tool(id) ->  str | None

sys.monitoring.use_tool_id raises a ValueError if id is in use.
sys.monitoring.get_tool returns the name of the tool if id is in use,
otherwise it returns None.

All IDs are treated the same by the VM with regard to events, but the
following IDs are pre-defined to make co-operation of tools easier:

    sys.monitoring.DEBUGGER_ID = 0
    sys.monitoring.COVERAGE_ID = 1
    sys.monitoring.PROFILER_ID = 2
    sys.monitoring.OPTIMIZER_ID = 5

There is no obligation to set an ID, nor is there anything preventing a
tool from using an ID even it is already in use. However, tools are
encouraged to use a unique ID and respect other tools.

For example, if a debugger were attached and DEBUGGER_ID were in use, it
should report an error, rather than carrying on regardless.

The OPTIMIZER_ID is provided for tools like Cinder or PyTorch that want
to optimize Python code, but need to decide what to optimize in a way
that depends on some wider context.

Setting events globally

Events can be controlled globally by modifying the set of events being
monitored:

-   sys.monitoring.get_events(tool_id:int)->int Returns the int
    representing all the active events.
-   sys.monitoring.set_events(tool_id:int, event_set: int) Activates all
    events which are set in event_set. Raises a ValueError if tool_id is
    not in use.

No events are active by default.

Per code object events

Events can also be controlled on a per code object basis:

-   sys.monitoring.get_local_events(tool_id:int, code: CodeType)->int
    Returns all the local events for code
-   sys.monitoring.set_local_events(tool_id:int, code: CodeType, event_set: int)
    Activates all the local events for code which are set in event_set.
    Raises a ValueError if tool_id is not in use.

Local events add to global events, but do not mask them. In other words,
all global events will trigger for a code object, regardless of the
local events.

Register callback functions

To register a callable for events call:

    sys.monitoring.register_callback(tool_id:int, event: int, func: Callable | None) -> Callable | None

If another callback was registered for the given tool_id and event, it
is unregistered and returned. Otherwise register_callback returns None.

Functions can be unregistered by calling
sys.monitoring.register_callback(tool_id, event, None).

Callback functions can be registered and unregistered at any time.

Registering or unregistering a callback function will generate a
sys.audit event.

Callback function arguments

When an active event occurs, the registered callback function is called.
Different events will provide the callback function with different
arguments, as follows:

-   PY_START and PY_RESUME:

        func(code: CodeType, instruction_offset: int) -> DISABLE | Any

-   PY_RETURN and PY_YIELD:

      func(code: CodeType, instruction_offset: int, retval: object) -> DISABLE | Any

-   CALL, C_RAISE and C_RETURN:

      func(code: CodeType, instruction_offset: int, callable: object, arg0: object | MISSING) -> DISABLE | Any

      If there are no arguments, arg0 is set to MISSING.

-   RAISE and EXCEPTION_HANDLED:

      func(code: CodeType, instruction_offset: int, exception: BaseException) -> DISABLE | Any

-   LINE:

      func(code: CodeType, line_number: int) -> DISABLE | Any

-   BRANCH:

      func(code: CodeType, instruction_offset: int, destination_offset: int) -> DISABLE | Any

    Note that the destination_offset is where the code will next
    execute. For an untaken branch this will be the offset of the
    instruction following the branch.

-   INSTRUCTION:

      func(code: CodeType, instruction_offset: int) -> DISABLE | Any

If a callback function returns DISABLE, then that function will no
longer be called for that (code, instruction_offset) until
sys.monitoring.restart_events() is called. This feature is provided for
coverage and other tools that are only interested seeing an event once.

Note that sys.monitoring.restart_events() is not specific to one tool,
so tools must be prepared to receive events that they have chosen to
DISABLE.

Events in callback functions

Events are suspended in callback functions and their callees for the
tool that registered that callback.

That means that other tools will see events in the callback functions
for other tools. This could be useful for debugging a profiling tool,
but would produce misleading profiles, as the debugger tool would show
up in the profile.

Order of events

If an instructions triggers several events they occur in the following
order:

-   LINE
-   INSTRUCTION
-   All other events (only one of these events can occur per
    instruction)

Each event is delivered to tools in ascending order of ID.

The "call" event group

Most events are independent; setting or disabling one event has no
effect on the others. However, the CALL, C_RAISE and C_RETURN events
form a group. If any of those events are set or disabled, then all
events in the group are. Disabling a CALL event will not disable the
matching C_RAISE or C_RETURN, but will disable all subsequent events.

Attributes of the sys.monitoring namespace

-   def use_tool_id(id)->None
-   def free_tool_id(id)->None
-   def get_events(tool_id: int)->int
-   def set_events(tool_id: int, event_set: int)->None
-   def get_local_events(tool_id: int, code: CodeType)->int
-   def set_local_events(tool_id: int, code: CodeType, event_set: int)->None
-   def register_callback(tool_id: int, event: int, func: Callable)->Optional[Callable]
-   def restart_events()->None
-   DISABLE: object
-   MISSING: object

Access to "debug only" features

Some features of the standard library are not accessible to normal code,
but are accessible to debuggers. For example, setting local variables,
or the line number.

These features will be available to callback functions.

Backwards Compatibility

This PEP is mostly backwards compatible.

There are some compatibility issues with PEP 523, as the behavior of PEP
523 plugins is outside of the VM's control. It is up to PEP 523 plugins
to ensure that they respect the semantics of this PEP. Simple plugins
that do not change the state of the VM, and defer execution to
_PyEval_EvalFrameDefault() should continue to work.

sys.settrace and sys.setprofile will act as if they were tools 6 and 7
respectively, so can be used alongside this PEP.

This means that sys.settrace and sys.setprofile may not work correctly
with all PEP 523 plugins. Although, simple PEP 523 plugins, as described
above, should be fine.

Performance

If no events are active, this PEP should have a small positive impact on
performance. Experiments show between 1 and 2% speedup from not
supporting sys.settrace directly.

The performance of sys.settrace will be about the same. The performance
of sys.setprofile should be better. However, tools relying on
sys.settrace and sys.setprofile can be made a lot faster by using the
API provided by this PEP.

If a small set of events are active, e.g. for a debugger, then the
overhead of callbacks will be orders of magnitudes less than for
sys.settrace and much cheaper than using PEP 523.

Coverage tools can be implemented at very low cost, by returning DISABLE
in all callbacks.

For heavily instrumented code, e.g. using LINE, performance should be
better than sys.settrace, but not by that much as performance will be
dominated by the time spent in callbacks.

For optimizing virtual machines, such as future versions of CPython (and
PyPy should they choose to support this API), changes to the set active
events in the midst of a long running program could be quite expensive,
possibly taking hundreds of milliseconds as it triggers
de-optimizations. Once such de-optimization has occurred, performance
should recover as the VM can re-optimize the instrumented code.

In general these operations can be considered to be fast:

-   def get_events(tool_id: int)->int
-   def get_local_events(tool_id: int, code: CodeType)->int
-   def register_callback(tool_id: int, event: int, func: Callable)->Optional[Callable]
-   def get_tool(tool_id) -> str | None

These operations are slower, but not especially so:

-   def set_local_events(tool_id: int, code: CodeType, event_set: int)->None

And these operations should be regarded as slow:

-   def use_tool_id(id, name:str)->None
-   def free_tool_id(id)->None
-   def set_events(tool_id: int, event_set: int)->None
-   def restart_events()->None

How slow the slow operations are depends on when they happen. If done
early in the program, before modules are loaded, they should be fairly
inexpensive.

Memory Consumption

When not in use, this PEP will have a negligible change on memory
consumption.

How memory is used is very much an implementation detail. However, we
expect that for 3.12 the additional memory consumption per code object
will be roughly as follows:

+-------------+--------+--------+-------------+
|             |        |        |             |
|             |        | Events |             |
+=============+========+========+=============+
|   Tools     | Others |   LINE | INSTRUCTION |
+-------------+--------+--------+-------------+
|   One       | None   |   ≈40% |   ≈80%      |
+-------------+--------+--------+-------------+
| Two or more |   ≈40% | ≈120%  |   ≈200%     |
+-------------+--------+--------+-------------+
|             |        |        |             |
+-------------+--------+--------+-------------+

Security Implications

Allowing modification of running code has some security implications,
but no more than the ability to generate and call new code.

All the new functions listed above will trigger audit hooks.

Implementation

This outlines the proposed implementation for CPython 3.12. The actual
implementation for later versions of CPython and other Python
implementations may differ considerably.

The proposed implementation of this PEP will be built on top of the
quickening step of CPython 3.11, as described in
PEP 659 <659#quickening>. Instrumentation works in much the same way as
quickening, bytecodes are replaced with instrumented ones as needed.

For example, if the CALL event is turned on, then all call instructions
will be replaced with a INSTRUMENTED_CALL instruction.

Note that this will interfere with specialization, which will result in
some performance degradation in addition to the overhead of calling the
registered callable.

When the set of active events changes, the VM will immediately update
all code objects present on the call stack of any thread. It will also
set in place traps to ensure that all code objects are correctly
instrumented when called. Consequently changing the set of active events
should be done as infrequently as possible, as it could be quite an
expensive operation.

Other events, such as RAISE can be turned on or off cheaply, as they do
not rely on code instrumentation, but runtime checks when the underlying
event occurs.

The exact set of events that require instrumentation is an
implementation detail, but for the current design, the following events
will require instrumentation:

-   PY_START
-   PY_RESUME
-   PY_RETURN
-   PY_YIELD
-   CALL
-   LINE
-   INSTRUCTION
-   JUMP
-   BRANCH

Each instrumented bytecode will require an additional 8 bits of
information to note which tool the instrumentation applies to. LINE and
INSTRUCTION events require additional information, as they need to store
the original instruction, or even the instrumented instruction if they
overlap other instrumentation.

Implementing tools

It is the philosophy of this PEP that it should be possible for
third-party monitoring tools to achieve high-performance, not that it
should be easy for them to do so.

Converting events into data that is meaningful to the users is the
responsibility of the tool.

All events have a cost, and tools should attempt to the use set of
events that trigger the least often and still provide the necessary
information.

Debuggers

Inserting breakpoints

Breakpoints can be inserted setting per code object events, either LINE
or INSTRUCTION, and returning DISABLE for any events not matching a
breakpoint.

Stepping

Debuggers usually offer the ability to step execution by a single
instruction or line.

Like breakpoints, stepping can be implemented by setting per code object
events. As soon as normal execution is to be resumed, the local events
can be unset.

Attaching

Debuggers can use the PY_START and PY_RESUME events to be informed when
a code object is first encountered, so that any necessary breakpoints
can be inserted.

Coverage Tools

Coverage tools need to track which parts of the control graph have been
executed. To do this, they need to register for the PY_ events, plus
JUMP and BRANCH.

This information can be then be converted back into a line based report
after execution has completed.

Profilers

Simple profilers need to gather information about calls. To do this
profilers should register for the following events:

-   PY_START
-   PY_RESUME
-   PY_THROW
-   PY_RETURN
-   PY_YIELD
-   PY_UNWIND
-   CALL
-   C_RAISE
-   C_RETURN

Line based profilers

Line based profilers can use the LINE and JUMP events. Implementers of
profilers should be aware that instrumenting LINE events will have a
large impact on performance.

Note

Instrumenting profilers have significant overhead and will distort the
results of profiling. Unless you need exact call counts, consider using
a statistical profiler.

Rejected ideas

A draft version of this PEP proposed making the user responsible for
inserting the monitoring instructions, rather than have VM do it.
However, that puts too much of a burden on the tools, and would make
attaching a debugger nearly impossible.

An earlier version of this PEP, proposed storing events as enums:

    class Event(enum.IntFlag):
        PY_START = ...

However, that would prevent monitoring of code before the enum module
was loaded and could cause unnecessary overhead.

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.