PEP: 626 Title: Precise line numbers for debugging and other tools.
Author: Mark Shannon <mark@hotpy.org> BDFL-Delegate: Pablo Galindo
<pablogsal@python.org> Status: Final Type: Standards Track Content-Type:
text/x-rst Created: 15-Jul-2020 Python-Version: 3.10 Post-History:
17-Jul-2020

Abstract

Python should guarantee that when tracing is turned on, "line" tracing
events are generated for all lines of code executed and only for lines
of code that are executed.

The f_lineno attribute of frame objects should always contain the
expected line number. During frame execution, the expected line number
is the line number of source code currently being executed. After a
frame has completed, either by returning or by raising an exception, the
expected line number is the line number of the last line of source that
was executed.

A side effect of ensuring correct line numbers, is that some bytecodes
will need to be marked as artificial, and not have a meaningful line
number. To assist tools, a new co_lines attribute will be added that
describes the mapping from bytecode to source.

Motivation

Users of sys.settrace and associated tools should be able to rely on
tracing events being generated for all lines of code, and only for
actual code. They should also be able to assume that the line number in
f_lineno is correct.

The current implementation mostly does this, but fails in a few cases.
This requires workarounds in tooling and is a nuisance for alternative
Python implementations.

Having this guarantee also benefits implementers of CPython in the long
term, as the current behaviour is not obvious and has some odd corner
cases.

Rationale

In order to guarantee that line events are generated when expected, the
co_lnotab attribute, in its current form, can no longer be the source of
truth for line number information.

Rather than attempt to fix the co_lnotab attribute, a new method
co_lines() will be added, which returns an iterator over bytecode
offsets and source code lines.

Ensuring that the bytecode is annotated correctly to enable accurate
line number information means that some bytecodes must be marked as
artificial, and not have a line number.

Some care must be taken not to break existing tooling. To minimize
breakage, the co_lnotab attribute will be retained, but lazily generated
on demand.

Specification

Line events and the f_lineno attribute should act as an experienced
Python user would expect in all cases.

Tracing

Tracing generates events for calls, returns, exceptions, lines of source
code executed, and, under some circumstances, instructions executed.

Only line events are covered by this PEP.

When tracing is turned on, line events will be generated when:

-   A new line of source code is reached.
-   A backwards jump occurs, even if it jumps to the same line, as may
    happen in list comprehensions.

Additionally, line events will never be generated for source code lines
that are not executed.

What is considered to be code for the purposes of tracing

All expressions and parts of expressions are considered to be executable
code.

In general, all statements are also considered to be executable code.
However, when a statement is spread over several lines, we must consider
which parts of a statement are considered to be executable code.

Statements are made up of keywords and expressions. Not all keywords
have a direct runtime effect, so not all keywords are considered to be
executable code. For example, else, is a necessary part of an if
statement, but there is no runtime effect associated with an else.

For the purposes of tracing, the following keywords will not be
considered to be executable code:

-   del -- The expression to be deleted is treated as the executable
    code.
-   else -- No runtime effect
-   finally -- No runtime effect
-   global -- Purely declarative
-   nonlocal -- Purely declarative

All other keywords are considered to be executable code.

Example event sequences

In the following examples, events are listed as "name", f_lineno pairs.

The code

    1.     global x
    2.     x = a

generates the following event:

    "line" 2

The code

    1.     try:
    2.        pass
    3.     finally:
    4.        pass

generates the following events:

    "line" 1
    "line" 2
    "line" 4

The code

    1.      for (
    2.          x) in [1]:
    3.          pass
    4.      return

generates the following events:

    "line" 2       # evaluate [1]
    "line" 1       # for
    "line" 2       # store to x
    "line" 3       # pass
    "line" 1       # for
    "line" 4       # return
    "return" 1

The f_lineno attribute

-   When a frame object is created, the f_lineno attribute will be set
    to the line at which the function or class is defined; that is the
    line on which the def or class keyword appears. For modules it will
    be set to zero.
-   The f_lineno attribute will be updated to match the line number
    about to be executed, even if tracing is turned off and no event is
    generated.

The new co_lines() method of code objects

The co_lines() method will return an iterator which yields tuples of
values, each representing the line number of a range of bytecodes. Each
tuple will consist of three values:

-   start -- The offset (inclusive) of the start of the bytecode range
-   end -- The offset (exclusive) of the end of the bytecode range
-   line -- The line number, or None if the bytecodes in the given range
    do not have a line number.

The sequence generated will have the following properties:

-   The first range in the sequence with have a start of 0
-   The (start, end) ranges will be non-decreasing and consecutive. That
    is, for any pair of tuples the start of the second will equal to the
    end of the first.
-   No range will be backwards, that is end >= start for all triples.
-   The final range in the sequence with have end equal to the size of
    the bytecode.
-   line will either be a positive integer, or None

Zero width ranges

Zero width range, that is ranges where start == end are allowed. Zero
width ranges are used for lines that are present in the source code, but
have been eliminated by the bytecode compiler.

The co_linetable attribute

The co_linetable attribute will hold the line number information. The
format is opaque, unspecified and may be changed without notice. The
attribute is public only to support creation of new code objects.

The co_lnotab attribute

Historically the co_lnotab attribute held a mapping from bytecode offset
to line number, but does not support bytecodes without a line number.
For backward compatibility, the co_lnotab bytes object will be lazily
created when needed. For ranges of bytecodes without a line number, the
line number of the previous bytecode range will be used.

Tools that parse the co_lnotab table should move to using the new
co_lines() method as soon as is practical.

Backwards Compatibility

The co_lnotab attribute will be deprecated in 3.10 and removed in 3.12.

Any tools that parse the co_lnotab attribute of code objects will need
to move to using co_lines() before 3.12 is released. Tools that use
sys.settrace will be unaffected, except in cases where the "line" events
they receive are more accurate.

Examples of code for which the sequence of trace events will change

In the following examples, events are listed as "name", f_lineno pairs.

pass statement in an if statement.

    0.  def spam(a):
    1.      if a:
    2.          eggs()
    3.      else:
    4.          pass

If a is True, then the sequence of events generated by Python 3.9 is:

    "line" 1
    "line" 2
    "line" 4
    "return" 4

From 3.10 the sequence will be:

    "line" 1
    "line" 2
    "return" 2

Multiple pass statements.

    0.  def bar():
    1.      pass
    2.      pass
    3.      pass

The sequence of events generated by Python 3.9 is:

    "line" 3
    "return" 3

From 3.10 the sequence will be:

    "line" 1
    "line" 2
    "line" 3
    "return" 3

C API

Access to the f_lineno attribute of frame objects through C API
functions is unchanged. f_lineno can be read by PyFrame_GetLineNumber.
f_lineno can only be set via PyObject_SetAttr and similar functions.

Accessing f_lineno directly through the underlying data structure is
forbidden.

Out of process debuggers and profilers

Out of process tools, such as py-spy[1], cannot use the C-API, and must
parse the line number table themselves. Although the line number table
format may change without warning, it will not change during a release
unless absolutely necessary for a bug fix.

To reduce the work required to implement these tools, the following C
struct and utility functions are provided. Note that these functions are
not part of the C-API, so will be need to be linked into any code that
needs to use them.

    typedef struct addressrange {
        int ar_start;
        int ar_end;
        int ar_line;
        struct _opaque opaque;
    } PyCodeAddressRange;

    void PyLineTable_InitAddressRange(char *linetable, Py_ssize_t length, int firstlineno, PyCodeAddressRange *range);
    int PyLineTable_NextAddressRange(PyCodeAddressRange *range);
    int PyLineTable_PreviousAddressRange(PyCodeAddressRange *range);

PyLineTable_InitAddressRange initializes the PyCodeAddressRange struct
from the line number table and first line number.

PyLineTable_NextAddressRange advances the range to the next entry,
returning non-zero if valid.

PyLineTable_PreviousAddressRange retreats the range to the previous
entry, returning non-zero if valid.

Note

The data in linetable is immutable, but its lifetime depends on its code
object. For reliable operation, linetable should be copied into a local
buffer before calling PyLineTable_InitAddressRange.

Although these functions are not part of C-API, they will provided by
all future versions of CPython. The PyLineTable_ functions do not call
into the C-API, so can be safely copied into any tool that needs to use
them. The PyCodeAddressRange struct will not be changed, but the _opaque
struct is not part of the specification and may change.

Note

The PyCodeAddressRange struct has changed from the original version of
this PEP, where the addition fields were defined, but were liable to
change.

For example, the following code prints out all the address ranges:

    void print_address_ranges(char *linetable, Py_ssize_t length, int firstlineno)
    {
        PyCodeAddressRange range;
        PyLineTable_InitAddressRange(linetable, length, firstlineno, &range);
        while (PyLineTable_NextAddressRange(&range)) {
            printf("Bytecodes from %d (inclusive) to %d (exclusive) ",
                   range.start, range.end);
            if (range.line < 0) {
                /* line < 0 means no line number */
                printf("have no line number\n");
            }
            else {
                printf("have line number %d\n", range.line);
            }
        }
    }

Performance Implications

In general, there should be no change in performance. When tracing,
programs should run a little faster as the new table format can be
designed with line number calculation speed in mind. Code with long
sequences of pass statements will probably become a bit slower.

Reference Implementation

https://github.com/markshannon/cpython/tree/new-linetable-format-version-2

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

References

[1] py-spy: Sampling profiler for Python programs
(https://github.com/benfred/py-spy)