PEP: 737 Title: C API to format a type fully qualified name Author:
Victor Stinner <vstinner@python.org> Discussions-To:
https://discuss.python.org/t/pep-737-unify-type-name-formatting/39872
Status: Final Type: Standards Track Created: 29-Nov-2023 Python-Version:
3.13 Post-History: 29-Nov-2023 Resolution:
https://discuss.python.org/t/pep-737-unify-type-name-formatting/39872/60

Abstract

Add new convenient C APIs to format a type fully qualified name. No
longer format type names differently depending on how types are
implemented.

Recommend using the type fully qualified name in error messages and in
__repr__() methods in new C code. Recommend not truncating type names in
new C code.

Add %T, %#T, %N and %#N formats to PyUnicode_FromFormat() to format the
fully qualified, respectively, of an object type and of a type.

Make C code safer by avoiding borrowed reference which can lead to
crashes. The new C API is compatible with the limited C API.

Rationale

Standard library

In the Python standard library, formatting a type name or the type name
of an object is a common operation to format an error message and to
implement a __repr__() method. There are different ways to format a type
name which give different outputs.

Example with the datetime.timedelta type:

-   The type short name (type.__name__) and the type qualified name
    (type.__qualname__) are 'timedelta'.
-   The type module (type.__module__) is 'datetime'.
-   The type fully qualified name is 'datetime.timedelta'.
-   The type representation (repr(type)) contains the fully qualified
    name: <class 'datetime.timedelta'>.

Python code

In Python, type.__name__ gets the type short name, whereas
f"{type.__module__}.{type.__qualname__}" formats the type "fully
qualified name". Usually, type(obj) or obj.__class__ are used to get the
type of the object obj. Sometimes, the type name is put between quotes.

Examples:

-   raise TypeError("str expected, not %s" % type(value).__name__)
-   raise TypeError("can't serialize %s" % self.__class__.__name__)
-   name = "%s.%s" % (obj.__module__, obj.__qualname__)

Qualified names were added to types (type.__qualname__) in Python 3.3 by
PEP 3155 "Qualified name for classes and functions".

C code

In C, the most common way to format a type name is to get the
PyTypeObject.tp_name member of the type. Example:

    PyErr_Format(PyExc_TypeError, "globals must be a dict, not %.100s",
                 Py_TYPE(globals)->tp_name);

The type "fully qualified name" is used in a few places:
PyErr_Display(), type.__repr__() implementation, and sys.unraisablehook
implementation.

Using Py_TYPE(obj)->tp_name is preferred since it is more convenient
than calling PyType_GetQualName() which requires Py_DECREF(). Moreover,
PyType_GetQualName() was only added recently, in Python 3.11.

Some functions use %R (repr(type)) to format a type name, the output
contains the type fully qualified name. Example:

    PyErr_Format(PyExc_TypeError,
                 "calling %R should have returned an instance "
                 "of BaseException, not %R",
                 type, Py_TYPE(value));

Using PyTypeObject.tp_name is inconsistent with Python

The PyTypeObject.tp_name member is different depending on the type
implementation:

-   Static types and heap types in C: tp_name is the type fully
    qualified name.
-   Python class: tp_name is the type short name (type.__name__).

So using Py_TYPE(obj)->tp_name to format an object type name gives a
different output depending if a type is implemented in C or in Python.

It goes against PEP 399 "Pure Python/C Accelerator Module Compatibility
Requirements" principles which recommends code behaves the same way if
written in Python or in C.

Example:

    $ python3.12
    >>> import _datetime; c_obj = _datetime.date(1970, 1, 1)
    >>> import _pydatetime; py_obj = _pydatetime.date(1970, 1, 1)
    >>> my_list = list(range(3))

    >>> my_list[c_obj]  # C type
    TypeError: list indices must be integers or slices, not datetime.date

    >>> my_list[py_obj]  # Python type
    TypeError: list indices must be integers or slices, not date

The error message contains the type fully qualified name (datetime.date)
if the type is implemented in C, or the type short name (date) if the
type is implemented in Python.

Limited C API

The Py_TYPE(obj)->tp_name code cannot be used with the limited C API,
since the PyTypeObject members are excluded from the limited C API.

The type name should be read using PyType_GetName(),
PyType_GetQualName() and PyType_GetModule() functions which are less
convenient to use.

Truncating type names in C

In 1998, when the PyErr_Format() function was added, the implementation
used a fixed buffer of 500 bytes. The function had the following
comment:

    /* Caller is responsible for limiting the format */

In 2001, the function was modified to allocate a dynamic buffer on the
heap. Too late, the practice of truncating type names, like using the
%.100s format, already became a habit, and developers forgot why type
names are truncated. In Python, type names are not truncated.

Truncating type names in C but not in Python goes against PEP 399 "Pure
Python/C Accelerator Module Compatibility Requirements" principles which
recommends code behaves the same way if written in Python or in C.

See the issue: Replace %.100s by %s in PyErr_Format(): the arbitrary
limit of 500 bytes is outdated (2011).

Specification

-   Add PyType_GetFullyQualifiedName() function.
-   Add PyType_GetModuleName() function.
-   Add formats to PyUnicode_FromFormat().
-   Recommend using the type fully qualified name in error messages and
    in __repr__() methods in new C code.
-   Recommend not truncating type names in new C code.

Add PyType_GetFullyQualifiedName() function

Add the PyType_GetFullyQualifiedName() function to get the fully
qualified name of a type: similar to
f"{type.__module__}.{type.__qualname__}", or type.__qualname__ if
type.__module__ is not a string or is equal to "builtins" or is equal to
"__main__".

API:

    PyObject* PyType_GetFullyQualifiedName(PyTypeObject *type)

On success, return a new reference to the string. On error, raise an
exception and return NULL.

Add PyType_GetModuleName() function

Add the PyType_GetModuleName() function to get the module name of a type
(type.__module__ string). API:

    PyObject* PyType_GetModuleName(PyTypeObject *type)

On success, return a new reference to the string. On error, raise an
exception and return NULL.

Add formats to PyUnicode_FromFormat()

Add the following formats to PyUnicode_FromFormat():

-   %N formats the fully qualified name of a type, similar to
    PyType_GetFullyQualifiedName(type); N stands for type Name.
-   %T formats the type fully qualified name of an object's type,
    similar to PyType_GetFullyQualifiedName(Py_TYPE(obj)); T stands for
    object Type.
-   %#N and %#T: the alternative form uses the colon separator (:),
    instead of the dot separator (.), between the module name and the
    qualified name.

For example, the existing code using tp_name:

    PyErr_Format(PyExc_TypeError,
                 "__format__ must return a str, not %.200s",
                 Py_TYPE(result)->tp_name);

can be replaced with the %T format:

    PyErr_Format(PyExc_TypeError,
                 "__format__ must return a str, not %T", result);

Advantages of the updated code:

-   Safer C code: avoid Py_TYPE() which returns a borrowed reference.
-   The PyTypeObject.tp_name member is no longer read explicitly: the
    code becomes compatible with the limited C API.
-   The formatted type name no longer depends on the type
    implementation.
-   The type name is no longer truncated.

Note: The %T format is used by time.strftime(), but not by printf().

Formats Summary

  C object   C type   Format
  ---------- -------- ---------------------------------------------
  %T         %N       Type fully qualified name.
  %#T        %#N      Type fully qualified name, colon separator.

Recommend using the type fully qualified name

The type fully qualified name is recommended in error messages and in
__repr__() methods in new C code.

In non-trivial applications, it is likely to have two types with the
same short name defined in two different modules, especially with
generic names. Using the fully qualified name helps identifying the type
in an unambiguous way.

Recommend not truncating type names

Type names should not be truncated in new C code. For example, the
%.100s format should be avoided: use the %s format instead (or %T format
in C).

Implementation

-   Pull request: Add type.__fully_qualified_name__ attribute.
-   Pull request: Add %T format to PyUnicode_FromFormat().

Backwards Compatibility

Changes proposed in this PEP are backward compatible.

Adding new C APIs has no effect on the backward compatibility. Existing
C APIs are left unchanged. No Python API is changed.

Replacing the type short name with the type fully qualified name is only
recommended in new C code. No longer truncating type names is only
recommended in new C code. Existing code should be left unchanged and so
remains backward compatible. There is no recommendation for Python code.

Rejected Ideas

Add type.__fully_qualified_name__ attribute

Add type.__fully_qualified_name__ read-only attribute, the fully
qualified name of a type: similar to
f"{type.__module__}.{type.__qualname__}", or type.__qualname__ if
type.__module__ is not a string or is equal to "builtins" or is equal to
"__main__".

The type.__repr__() is left unchanged, it only omits the module if the
module is equal to "builtins".

This change was rejected by the Steering Council:

  We can see the usefulness of the C API changes proposed by the PEP and
  would likely accept those changes as is.

  We see less justification for the Python level changes. We especially
  question the need for __fully_qualified_name__.

Thomas Wouters added:

  If there really is a desire for formatting types the exact same way
  the C API does it, a utility function would make more sense to me,
  personally, than type.__format__, but I think the SC could be
  persuaded given some concrete use-cases.

Add type.__format__() method

Add type.__format__() method with the following formats:

-   N formats the type fully qualified name
    (type.__fully_qualified_name__); N stands for Name.
-   #N (alternative form) formats the type fully qualified name using
    the colon (:) separator, instead of the dot separator (.), between
    the module name and the qualified name.

Examples using f-string:

    >>> import datetime
    >>> f"{datetime.timedelta:N}"  # fully qualified name
    'datetime.timedelta'
    >>> f"{datetime.timedelta:#N}" # fully qualified name, colon separator
    'datetime:timedelta'

The colon (:) separator used by the #N format eliminates guesswork when
you want to import the name, see pkgutil.resolve_name(),
python -m inspect command line interface, and setuptools entry points.

This change was rejected by the Steering Council.

Change str(type)

The type.__str__() method can be modified to format a type name
differently. For example, it can return the type fully qualified name.

The problem is that it's a backward incompatible change. For example,
enum, functools, optparse, pdb and xmlrpc.server modules of the standard
library have to be updated. test_dataclasses, test_descrtut and
test_cmd_line_script tests have to be updated as well.

See the pull request: type(str) returns the fully qualified name.

Add !t formatter to get an object type

Use f"{obj!t:T}" to format type(obj).__fully_qualified_name__, similar
to f"{type(obj):T}".

When the !t formatter was proposed in 2018, Eric Smith was strongly
opposed to this; Eric is the author of the f-string PEP 498 "Literal
String Interpolation".

Add formats to str % args

It was proposed to add formats to format a type name in str % arg. For
example, add the %T format to format a type fully qualified name.

Nowadays, f-strings are preferred for new code.

Other ways to format type names in C

The printf() function supports multiple size modifiers: hh (char), h
(short), l (long), ll (long long), z (size_t), t (ptrdiff_t) and j
(intmax_t). The PyUnicode_FromFormat() function supports most of them.

Proposed formats using h and hh length modifiers:

-   %hhT formats type.__name__.
-   %hT formats type.__qualname__.
-   %T formats type.__fully_qualified_name__.

Length modifiers are used to specify the C type of the argument, not to
change how an argument is formatted. The alternate form (#) changes how
an argument is formatted. Here the argument C type is always PyObject*.

Other proposed formats:

-   %Q
-   %t.
-   %lT formats type.__fully_qualified_name__.
-   %Tn formats type.__name__.
-   %Tq formats type.__qualname__.
-   %Tf formats type.__fully_qualified_name__.

Having more options to format type names can lead to inconsistencies
between different modules and make the API more error prone.

About the %t format, printf() now uses t as a length modifier for
ptrdiff_t argument.

The following APIs to be used to format a type:

  C API                    Python API          Format
  ------------------------ ------------------- ----------------------
  PyType_GetName()         type.__name__       Type short name.
  PyType_GetQualName()     type.__qualname__   Type qualified name.
  PyType_GetModuleName()   type.__module__     Type module name.

Use %T format with Py_TYPE(): pass a type

It was proposed to pass a type to the %T format, like:

    PyErr_Format(PyExc_TypeError, "object type name: %T", Py_TYPE(obj));

The Py_TYPE() functions returns a borrowed reference. Just to format an
error, using a borrowed reference to a type looks safe. In practice, it
can lead to crash. Example:

    import gc
    import my_cext

    class ClassA:
        pass

    def create_object():
         class ClassB:
              def __repr__(self):
                    self.__class__ = ClassA
                    gc.collect()
                    return "ClassB repr"
         return ClassB()

    obj = create_object()
    my_cext.func(obj)

where my_cext.func() is a C function which calls:

    PyErr_Format(PyExc_ValueError,
                 "Unexpected value %R of type %T",
                 obj, Py_TYPE(obj));

PyErr_Format() is called with a borrowed reference to ClassB. When
repr(obj) is called by the %R format, the last reference to ClassB is
removed and the class is deallocated. When the %T format is proceed,
Py_TYPE(obj) is already a dangling pointer and Python does crash.

Other proposed APIs to get a type fully qualified name

-   Add type.__fullyqualname__ attribute: name without underscore
    between words. Several dunders, including some of the most recently
    added ones, include an underscore in the word: __class_getitem__,
    __release_buffer__, __type_params__, __init_subclass__ and
    __text_signature__.
-   Add type.__fqn__ attribute: FQN name stands for Fully Qualified
    Name.
-   Add type.fully_qualified_name() method. Methods added to type are
    inherited by all types and so can affect existing code.
-   Add a function to the inspect module. Need to import the inspect
    module to use it.

Include the __main__ module in the type fully qualified name

Format type.__fully_qualified_name__ as
f"{type.__module__}.{type.__qualname__}", or type.__qualname__ if
type.__module__ is not a string or is equal to "builtins". Do not treat
the __main__ module differently: include it in the name.

Existing code such as type.__repr__(), collections.abc and unittest
modules format a type name with f'{obj.__module__}.{obj.__qualname__}'
and only omit the module part if the module is equal to builtins.

Only the traceback and pdb modules also omit the module if it's equal to
"builtins" or "__main__".

The type.__fully_qualified_name__ attribute omits the __main__ module to
produce shorter names for a common case: types defined in a script run
with python script.py. For debugging, the repr() function can be used on
a type, it includes the __main__ module in the type name. Or use
f"{type.__module__}.{type.__qualname__}" format to always include the
module name, even for the "builtins" module.

Example of script:

    class MyType:
        pass

    print(f"name: {MyType.__fully_qualified_name__}")
    print(f"repr: {repr(MyType)}")

Output:

    name: MyType
    repr: <class '__main__.MyType'>

Discussions

-   Discourse: PEP 737 – Unify type name formatting (2023).
-   Discourse: Enhance type name formatting when raising an exception:
    add %T format in C, and add type.__fullyqualname__ (2023).
-   Issue: PyUnicode_FromFormat(): Add %T format to format the type name
    of an object (2023).
-   Issue: C API: Investigate how the PyTypeObject members can be
    removed from the public C API (2023).
-   python-dev thread: bpo-34595: How to format a type name? (2018).
-   Issue: PyUnicode_FromFormat(): add %T format for an object type name
    (2018).
-   Issue: Replace %.100s by %s in PyErr_Format(): the arbitrary limit
    of 500 bytes is outdated (2011).

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.