PEP 669 – Low Impact Monitoring for CPython
- Mark Shannon <mark at hotpy.org>
- Standards Track
Table of Contents
- Backwards Compatibility
- Security Implications
- Implementing tools
- Rejected ideas
Using a profiler or debugger in CPython can have a severe impact on performance. Slowdowns by an order of magnitude are common.
This PEP proposes an API for monitoring of Python programs running on CPython that will enable monitoring at low cost.
Although this PEP does not specify an implementation, it is expected that it will be implemented using the quickening step of PEP 659.
sys.monitoring namespace will be added, which will contain
the relevant functions and enum.
Developers should not have to pay an unreasonable cost to use debuggers, profilers and other similar tools.
C++ and Java developers expect to be able to run a program at full speed (or very close to it) under a debugger. Python developers should expect that too.
The quickening mechanism provided by PEP 659 provides a way to dynamically modify executing Python bytecode. These modifications have little cost beyond the parts of the code that are modified and a relatively low cost to those parts that are modified. We can leverage this to provide an efficient mechanism for monitoring that was not possible in 3.10 or earlier.
By using quickening, we expect that code run under a debugger on 3.11 should easily outperform code run without a debugger on 3.10. Profiling will still slow down execution, but by much less than in 3.10.
Monitoring of Python programs is done by registering callback functions for events and by activating a set of events.
Activating events and registering callback functions are independent of each other.
As a code object executes various events occur that might be of interest to tools. By activating events and by registering callback functions tools can respond to these events in any way that suits them. Events can be set globally, or for individual code objects.
For 3.11, CPython will support the following events:
- PY_CALL: Call of a Python function (occurs immediately after the call, the callee’s frame will be on the stack)
- PY_RESUME: Resumption of a Python function (for generator and coroutine functions), except for throw() calls.
- PY_THROW: A Python function is resumed by a throw() call.
- PY_RETURN: Return from a Python function (occurs immediately before the return, the callee’s frame will be on the stack).
- PY_YIELD: Yield from a Python function (occurs immediately before the yield, the callee’s frame will be on the stack).
- PY_UNWIND: Exit from a Python function during exception unwinding.
- C_CALL: Call of a builtin function (before the call in this case).
- C_RETURN: Return from a builtin function (after the return in this case).
- RAISE: An exception is raised.
- EXCEPTION_HANDLED: An exception is handled.
- LINE: An instruction is about to be executed that has a different line number from the preceding instruction.
- INSTRUCTION – A VM instruction is about to be executed.
- JUMP – An unconditional jump in the control flow graph is reached.
- BRANCH – A conditional branch is about to be taken (or not).
- MARKER – A marker is hit
More events may be added in the future.
All events will be attributes of the
Event enum in
class Event(enum.IntFlag): PY_CALL = ...
Event is an
IntFlag which means that the events can be or-ed
together to form a set of events.
Setting events globally
Events can be controlled globally by modifying the set of events being monitored:
Eventset for all the active events.
sys.monitoring.set_events(event_set: Event)Activates all events which are set in
No events are active by default.
Per code object events
Events can also be controlled on a per code object basis:
sys.monitoring.get_local_events(code: CodeType)->EventReturns the
Eventset for all the local events for
sys.monitoring.set_local_events(code: CodeType, event_set: Event)Activates all the local events for
codewhich are set in
Local events add to global events, but do not mask them. In other words, all global events will trigger for a code object, regardless of the local events.
Register callback functions
To register a callable for events call:
register_callback returns the previously registered callback, or
Functions can be unregistered by calling
Callback functions can be registered and unregistered at any time.
Registering a callback function will generate a
Callback function arguments
When an active event occurs, the registered callback function is called. Different events will provide the callback function with different arguments, as follows:
- All events starting with
func(code: CodeType, instruction_offset: int)
func(code: CodeType, instruction_offset: int, callable: object)
func(code: CodeType, instruction_offset: int, exception: BaseException)
func(code: CodeType, line_number: int)
func(code: CodeType, instruction_offset: int, destination_offset: int)
Note that the
destination_offsetis where the code will next execute. For an untaken branch this will be the offset of the instruction following the branch.
func(code: CodeType, instruction_offset: int)
func(code: CodeType, instruction_offset: int, marker_id: int)
Inserting and removing markers
Two new functions are added to the
sys module to support markers.
sys.monitoring.insert_marker(code: CodeType, offset: int, marker_id=0: range(256))
sys.monitoring.remove_marker(code: CodeType, offset: int)
marker_id has no meaning to the VM,
and is used only as an argument to the callback function.
marker_id must in the range 0 to 255 (inclusive).
Attributes of the
def set_events(event_set: Event)->None
def get_local_events(code: CodeType)->Event
def set_local_events(code: CodeType, event_set: Event)->None
def register_callback(event: Event, func: Callable)->Optional[Callable]
def insert_marker(code: CodeType, offset: Event, marker_id=0: range(256))->None
def remove_marker(code: CodeType, offset: Event)->None
This PEP is fully backwards compatible, in the sense that old code will work if the features of this PEP are unused.
However, if it is used it will effectively disable
sys.setprofile and PEP 523 frame evaluation.
If PEP 523 is in use, or
sys.setprofile has been
set, then calling
sys.monitoring.set_local_events() will raise an exception.
sys.monitoring.set_local_events() has been called, then using PEP 523
sys.setprofile will raise an exception.
This PEP is incompatible with
because the implementation of
will use the same underlying mechanism as this PEP. It would be too slow
to support both the new and old monitoring mechanisms at the same time,
and they would interfere in awkward ways if both were active at the same time.
This PEP is incompatible with PEP 523, because PEP 523 prevents the VM being able to modify the code objects of executing code, which is a necessary feature.
We may seek to remove
sys.settrace and PEP 523 in the future once the APIs
provided by this PEP have been widely adopted, but that is for another PEP.
If no events are active, this PEP should have a negligible impact on performance.
If a small set of events are active, e.g. for a debugger, then the overhead
of callbacks will be orders of magnitudes less than for
much cheaper than using PEP 523.
For heavily instrumented code, e.g. using
LINE, performance should be
sys.settrace, but not by that much as performance will be
dominated by the time spent in callbacks.
For optimizing virtual machines, such as future versions of CPython
PyPy should they choose to support this API), changing the set of
globally active events in the midst of a long running program could be quite
expensive, possibly taking hundreds of milliseconds as it triggers
de-optimizations. Once such de-optimization has occurred, performance should
recover as the VM can re-optimize the instrumented code.
Allowing modification of running code has some security implications, but no more than the ability to generate and call new code.
All the new functions listed above will trigger audit hooks.
This outlines the proposed implementation for CPython 3.11. The actual implementation for later versions of CPython and other Python implementations may differ considerably.
The proposed implementation of this PEP will be built on top of the quickening step of PEP 659. Activating some events will cause all code objects to be quickened before they are executed.
For example, if the
LINE event is turned on, then all instructions that
are at the start of a line will be replaced with a
Note that this will interfere with specialization, which will result in some performance degradation in addition to the overhead of calling the registered callable.
When the set of active events changes, the VM will immediately update all code objects present on the call stack of any thread. It will also set in place traps to ensure that all code objects are correctly instrumented when called. Consequently changing the set of active events should be done as infrequently as possible, as it could be quite an expensive operation.
Other events, such as
RAISE can be turned on or off cheaply,
as they do not rely on code instrumentation, but runtime checks when the
underlying event occurs.
The exact set of events that require instrumentation is an implementation detail, but for the current design, the following events will require instrumentation:
It is the philosophy of this PEP that it should be possible for third-party monitoring tools to achieve high-performance, not that it should be easy for them to do so.
Converting events into data that is meaningful to the users is the responsibility of the tool.
All events have a cost, and tools should attempt to the use set of events that trigger the least often and still provide the necessary information.
Breakpoints can be inserted by using markers. For example:
Which will insert a marker at
which can be used as a breakpoint.
To insert a breakpoint at a given line, the matching instruction offsets
should be found from
Breakpoints can be removed by removing the marker:
Debuggers usually offer the ability to step execution by a single instruction or line.
This can be implemented by inserting a new marker at the required offset(s) of the code to be stepped to, and by removing the current marker.
It is the job of the debugger to compute the relevant offset(s).
Debuggers can use the
PY_CALL, etc. events to be informed when
a code object is first encountered, so that any necessary breakpoints
can be inserted.
Coverage tools need to track which parts of the control graph have been
executed. To do this, they need to register for the
This information can be then be converted back into a line based report after execution has completed.
Simple profilers need to gather information about calls. To do this profilers should register for the following events:
Line based profilers
Line based profilers can use the
Implementers of profilers should be aware that instrumenting
JUMP events will have a large impact on performance.
Instrumenting profilers have significant overhead and will distort the results of profiling. Unless you need exact call counts, consider using a statistical profiler.
A draft version of this PEP proposed making the user responsible for inserting the monitoring instructions, rather than have VM do it. However, that puts too much of a burden on the tools, and would make attaching a debugger nearly impossible.
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Last modified: 2022-03-09 16:04:44 GMT