PEP: 657 Title: Include Fine Grained Error Locations in Tracebacks
Version: $Revision$ Last-Modified: $Date$ Author: Pablo Galindo
<pablogsal@python.org>, Batuhan Taskaya <batuhan@python.org>, Ammar
Askar <ammar@ammaraskar.com> Discussions-To:
https://discuss.python.org/t/pep-657-include-fine-grained-error-locations-in-tracebacks/8629
Status: Final Type: Standards Track Content-Type: text/x-rst Created:
08-May-2021 Python-Version: 3.11 Post-History:

Abstract

This PEP proposes adding a mapping from each bytecode instruction to the
start and end column offsets of the line that generated them as well as
the end line number. This data will be used to improve tracebacks
displayed by the CPython interpreter in order to improve the debugging
experience. The PEP also proposes adding APIs that allow other tools
(such as coverage analysis tools, profilers, tracers, debuggers) to
consume this information from code objects.

Motivation

The primary motivation for this PEP is to improve the feedback presented
about the location of errors to aid with debugging.

Python currently keeps a mapping of bytecode to line numbers from
compilation. The interpreter uses this mapping to point to the source
line associated with an error. While this line-level granularity for
instructions is useful, a single line of Python code can compile into
dozens of bytecode operations making it hard to track which part of the
line caused the error.

Consider the following line of Python code:

    x['a']['b']['c']['d'] = 1

If any of the values in the dictionaries are None, the error shown is:

    Traceback (most recent call last):
      File "test.py", line 2, in <module>
        x['a']['b']['c']['d'] = 1
    TypeError: 'NoneType' object is not subscriptable

From the traceback, it is impossible to determine which one of the
dictionaries had the None element that caused the error. Users often
have to attach a debugger or split up their expression to track down the
problem.

However, if the interpreter had a mapping of bytecode to column offsets
as well as line numbers, it could helpfully display:

    Traceback (most recent call last):
      File "test.py", line 2, in <module>
        x['a']['b']['c']['d'] = 1
        ~~~~~~~~~~~^^^^^
    TypeError: 'NoneType' object is not subscriptable

indicating to the user that the object x['a']['b'] must have been None.
This highlighting will occur for every frame in the traceback. For
instance, if a similar error is part of a complex function call chain,
the traceback would display the code associated to the current
instruction in every frame:

    Traceback (most recent call last):
      File "test.py", line 14, in <module>
        lel3(x)
        ^^^^^^^
      File "test.py", line 12, in lel3
        return lel2(x) / 23
               ^^^^^^^
      File "test.py", line 9, in lel2
        return 25 + lel(x) + lel(x)
                    ^^^^^^
      File "test.py", line 6, in lel
        return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
                             ~~~~~~~~~~~~~~~~^^^^^
    TypeError: 'NoneType' object is not subscriptable

This problem presents itself in the following situations.

-   When passing down multiple objects to function calls while accessing
    the same attribute in them. For instance, this error:

        Traceback (most recent call last):
          File "test.py", line 19, in <module>
            foo(a.name, b.name, c.name)
        AttributeError: 'NoneType' object has no attribute 'name'

    With the improvements in this PEP this would show:

        Traceback (most recent call last):
          File "test.py", line 17, in <module>
            foo(a.name, b.name, c.name)
                        ^^^^^^
        AttributeError: 'NoneType' object has no attribute 'name'

-   When dealing with lines with complex mathematical expressions,
    especially with libraries such as numpy where arithmetic operations
    can fail based on the arguments. For example: :

        Traceback (most recent call last):
          File "test.py", line 1, in <module>
            x = (a + b) @ (c + d)
        ValueError: operands could not be broadcast together with shapes (1,2) (2,3)

    There is no clear indication as to which operation failed, was it
    the addition on the left, the right or the matrix multiplication in
    the middle? With this PEP the new error message would look like:

        Traceback (most recent call last):
          File "test.py", line 1, in <module>
            x = (a + b) @ (c + d)
                           ~~^~~
        ValueError: operands could not be broadcast together with shapes (1,2) (2,3)

    Giving a much clearer and easier to debug error message.

Debugging aside, this extra information would also be useful for code
coverage tools, enabling them to measure expression-level coverage
instead of just line-level coverage. For instance, given the following
line: :

    x = foo() if bar() else baz()

coverage, profile or state analysis tools will highlight the full line
in both branches, making it impossible to differentiate what branch was
taken. This is a known problem in pycoverage.

Similar efforts to this PEP have taken place in other languages such as
Java in the form of JEP358. NullPointerExceptions in Java were similarly
nebulous when it came to lines with complicated expressions. A
NullPointerException would provide very little aid in finding the root
cause of an error. The implementation for JEP358 is fairly complex,
requiring walking back through the bytecode by using a control flow
graph analyzer and decompilation techniques to recover the source code
that led to the null pointer. Although the complexity of this solution
is high and requires maintenance for the decompiler every time Java
bytecode is changed, this improvement was deemed to be worth it for the
extra information provided for just one exception type.

Rationale

In order to identify the range of source code being executed when
exceptions are raised, this proposal requires adding new data for every
bytecode instruction. This will have an impact on the size of pyc files
on disk and the size of code objects in memory. The authors of this
proposal have chosen the data types in a way that tries to minimize this
impact. The proposed overhead is storing two uint8_t (one for the start
offset and one for the end offset) and the end line information for
every bytecode instruction (in the same encoded fashion as the start
line is stored currently).

As an illustrative example to gauge the impact of this change, we have
calculated that including the start and end offsets will increase the
size of the standard library’s pyc files by 22% (6MB) from 28.4MB to
34.7MB. The overhead in memory usage will be the same (assuming the full
standard library is loaded into the same program). We believe that this
is a very acceptable number since the order of magnitude of the overhead
is very small, especially considering the storage size and memory
capabilities of modern computers. Additionally, in general the memory
size of a Python program is not dominated by code objects. To check this
assumption we have executed the test suite of several popular PyPI
projects (including NumPy, pytest, Django and Cython) as well as several
applications (Black, pylint, mypy executed over either mypy or the
standard library) and we found that code objects represent normally 3-6%
of the average memory size of the program.

We understand that the extra cost of this information may not be
acceptable for some users, so we propose an opt-out mechanism which will
cause generated code objects to not have the extra information while
also allowing pyc files to not include the extra information.

Specification

In order to have enough information to correctly resolve the location
within a given line where an error was raised, a map linking bytecode
instructions to column offsets (start and end offset) and end line
numbers is needed. This is similar in fashion to how line numbers are
currently linked to bytecode instructions.

The following changes will be performed as part of the implementation of
this PEP:

-   The offset information will be exposed to Python via a new attribute
    in the code object class called co_positions that will return a
    sequence of four-element tuples containing the full location of
    every instruction (including start line, end line, start column
    offset and end column offset) or None if the code object was created
    without the offset information.

-   One new C-API function: :

        int PyCode_Addr2Location(
            PyCodeObject *co, int addrq,
            int *start_line, int *start_column,
            int *end_line, int *end_column)

    will be added so the end line, the start column offsets and the end
    column offset can be obtained given the index of a bytecode
    instruction. This function will set the values to 0 if the
    information is not available.

The internal storage, compression and encoding of the information is
left as an implementation detail and can be changed at any point as long
as the public API remains unchanged.

Offset semantics

These offsets are propagated by the compiler from the ones stored
currently in all AST nodes. The output of the public APIs (co_positions
and PyCode_Addr2Location) that deal with these attributes use 0-indexed
offsets (just like the AST nodes), but the underlying implementation is
free to represent the actual data in whatever form they choose to be
most efficient. The error code regarding information not available is
None for the co_positions() API, and -1 for the PyCode_Addr2Location
API. The availability of the information highly depends on whether the
offsets fall under the range, as well as the runtime flags for the
interpreter configuration.

The AST nodes use int types to store these values. The current
implementation, however, utilizes uint8_t types as an implementation
detail to minimize storage impact. This decision allows offsets to go
from 0 to 255, while offsets bigger than these values will be treated as
missing (returning -1 on the PyCode_Addr2Location and None API in the
co_positions() API).

As specified previously, the underlying storage of the offsets should be
considered an implementation detail, as the public APIs to obtain this
values will return either C int types or Python int objects, which
allows to implement better compression/encoding in the future if bigger
ranges would need to be supported. This PEP proposes to start with this
simpler version and defer improvements to future work.

Displaying tracebacks

When displaying tracebacks, the default exception hook will be modified
to query this information from the code objects and use it to display a
sequence of carets for every displayed line in the traceback if the
information is available. For instance:

    File "test.py", line 6, in lel
      return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
                           ~~~~~~~~~~~~~~~~^^^^^
    TypeError: 'NoneType' object is not subscriptable

When displaying tracebacks, instruction offsets will be taken from the
traceback objects. This makes highlighting exceptions that are re-raised
work naturally without the need to store the new information in the
stack. For example, for this code:

    def foo(x):
        1 + 1/0 + 2

    def bar(x):
        try:
            1 + foo(x) + foo(x)
        except Exception as e:
            raise ValueError("oh no!") from e

    bar(bar(bar(2)))

The printed traceback would look like this:

    Traceback (most recent call last):
      File "test.py", line 6, in bar
        1 + foo(x) + foo(x)
            ^^^^^^
      File "test.py", line 2, in foo
        1 + 1/0 + 2
            ~^~
    ZeroDivisionError: division by zero

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last):
      File "test.py", line 10, in <module>
        bar(bar(bar(2)))
                ^^^^^^
      File "test.py", line 8, in bar
        raise ValueError("oh no!") from e
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ValueError: oh no

While this code:

    def foo(x):
        1 + 1/0 + 2
    def bar(x):
        try:
            1 + foo(x) + foo(x)
        except Exception:
            raise
    bar(bar(bar(2)))

Will be displayed as:

    Traceback (most recent call last):
      File "test.py", line 10, in <module>
        bar(bar(bar(2)))
                ^^^^^^
      File "test.py", line 6, in bar
        1 + foo(x) + foo(x)
            ^^^^^^
      File "test.py", line 2, in foo
        1 + 1/0 + 2
            ~^~
    ZeroDivisionError: division by zero

Maintaining the current behavior, only a single line will be displayed
in tracebacks. For instructions that span multiple lines (the end offset
and the start offset belong to different lines), the end line number
must be inspected to know if the end offset applies to the same line as
the starting offset.

Opt-out mechanism

To offer an opt-out mechanism for those users that care about the
storage and memory overhead and to allow third party tools and other
programs that are currently parsing tracebacks to catch up the following
methods will be provided to deactivate this feature:

-   A new environment variable: PYTHONNODEBUGRANGES.
-   A new command line option for the dev mode:
    python -Xno_debug_ranges.

If any of these methods are used, the Python compiler will not populate
code objects with the new information (None will be used instead) and
any unmarshalled code objects that contain the extra information will
have it stripped away and replaced with None). Additionally, the
traceback machinery will not show the extended location information even
if the information was present. This method allows users to:

-   Create smaller pyc files by using one of the two methods when said
    files are created.
-   Don't load the extra information from pyc files if those were
    created with the extra information in the first place.
-   Deactivate the extra information when displaying tracebacks (the
    caret characters indicating the location of the error).

Doing this has a very small performance hit as the interpreter state
needs to be fetched when code objects are created to look up the
configuration. Creating code objects is not a performance sensitive
operation so this should not be a concern.

Backwards Compatibility

The change is fully backwards compatible.

Reference Implementation

A reference implementation can be found in the implementation fork.

Rejected Ideas

Use a single caret instead of a range

It has been proposed to use a single caret instead of highlighting the
full range when reporting errors as a way to simplify the feature. We
have decided to not go this route for the following reasons:

-   Deriving the location of the caret is not straightforward using the
    current layout of the AST. This is because the AST nodes only record
    the start and end line numbers as well as the start and end column
    offsets. As the AST nodes do not preserve the original tokens (by
    design) deriving the exact location of some tokens is not possible
    without extra re-parsing. For instance, currently binary operators
    have nodes for the operands but the type of the operator is stored
    in an enumeration so its location cannot be derived from the node
    (this is just an example of how this problem manifest, and not the
    only one).

-   Deriving the ranges from AST nodes greatly simplifies the
    implementation and reduces a lot the maintenance cost and the
    possibilities of errors. This is because using the ranges is always
    possible to do generically for any AST node, while any other custom
    information would need to be extracted differently from different
    types of nodes. Given how error-prone getting the locations manually
    was when this used to be a manual process when generating the AST,
    we believe that a generic solution is a very important property to
    pursue.

-   Storing the information to highlight a single caret will be very
    limiting for tools such as coverage tools and profilers as well as
    for tools like IPython and IDEs that want to make use of this new
    feature. As this message from the author of "friendly-traceback"
    mentions, the reason is that without the full range (including end
    lines) these tools will find very difficult to highlight correctly
    the relevant source code. For instance, for this code:

        something = foo(a,b,c) if bar(a,b,c) else other(b,c,d)

    tools (such as coverage reporters) want to be able to highlight the
    totality of the call that is covered by the executed bytecode (let's
    say foo(a,b,c)) and not just a single character. Even if is
    technically possible to re-parse and re-tokenize the source code to
    re-construct the information, it is not possible to do this reliably
    and would result in a much worse user experience.

-   Many users have reported that a single caret is much harder to read
    than a full range, and this motivated using ranges to highlight
    syntax errors, which was very well received. Additionally, it has
    been noted that users with vision problems can identify the ranges
    much easily than a single caret character, which we believe is a
    great advantage of using ranges.

Have a configure flag to opt out

Having a configure flag to opt out of the overhead even when executing
Python in non-optimized mode may sound desirable, but it may cause
problems when reading pyc files that were created with a version of the
interpreter that was not compiled with the flag activated. This can lead
to crashes that would be very difficult to debug for regular users and
will make different pyc files incompatible between each other. As this
pyc could be shipped as part of libraries or applications without the
original source, it is also not always possible to force recompilation
of said pyc files. For these reasons we have decided to use the -O flag
to opt-out of this behaviour.

Lazy loading of column information

One potential solution to reduce the memory usage of this feature is to
not load the column information from the pyc file when code is imported.
Only if an uncaught exception bubbles up or if a call to the C-API
functions is made will the column information be loaded from the pyc
file. This is similar to how we only read source lines to display them
in the traceback when an exception bubbles up. While this would indeed
lower memory usage, it also results in a far more complex implementation
requiring changes to the importing machinery to selectively ignore a
part of the code object. We consider this an interesting avenue to
explore but ultimately we think is out of the scope for this particular
PEP. It also means that column information will not be available if the
user is not using pyc files or for code objects created dynamically at
runtime.

Implement compression

Although it would be possible to implement some form of compression over
the pyc files and the new data in code objects, we believe that this is
out of the scope of this proposal due to its larger impact (in the case
of pyc files) and the fact that we expect column offsets to not compress
well due to the lack of patterns in them (in case of the new data in
code objects).

Acknowledgments

Thanks to Carl Friedrich Bolz-Tereick for showing an initial prototype
of this idea for the Pypy interpreter and for the helpful discussion.

References

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.