PEP: 697 Title: Limited C API for Extending Opaque Types Author: Petr
Viktorin <encukou@gmail.com> Discussions-To:
https://discuss.python.org/t/19743 Status: Final Type: Standards Track
Content-Type: text/x-rst Created: 23-Aug-2022 Python-Version: 3.12
Post-History: 24-May-2022, 06-Oct-2022, Resolution:
https://discuss.python.org/t/19743/30

PyType_Spec.basicsize, PyObject_GetTypeData, Py_TPFLAGS_ITEMS_AT_END,
Py_RELATIVE_OFFSET, PyObject_GetItemData

Abstract

Add Limited C API support for extending some types with opaque data by
allowing code to only deal with data specific to a particular
(sub)class.

This mechanism is required to be usable with PyHeapTypeObject.

This PEP does not propose allowing to extend non-dynamically sized
variable sized objects such as tuple or int due to their different
memory layout and perceived lack of demand for doing so. This PEP leaves
room to do so in the future via the same mechanism if ever desired.

Motivation

The motivating problem this PEP solves is attaching C-level state to
custom types --- i.e. metaclasses (subclasses of python:type).

This is often needed in “wrappers” that expose another type system (e.g.
C++, Java, Rust) as Python classes. These typically need to attach
information about the “wrapped” non-Python class to the Python type
object.

This should be possible to do in the Limited API, so that the language
wrappers or code generators can be used to create Stable ABI extensions.
(See PEP 652 for the benefits of providing a stable ABI.)

Extending type is an instance of a more general problem: extending a
class while maintaining loose coupling – that is, not depending on the
memory layout used by the superclass. (That's a lot of jargon; see
Rationale for a concrete example of extending list.)

Rationale

Extending opaque types

In the Limited API, most structs are opaque: their size and memory
layout are not exposed, so they can be changed in new versions of
CPython (or alternate implementations of the C API).

This means that the usual subclassing pattern -- making the struct used
for instances of the base type be the first element of the struct used
for instances of the derived type -- does not work. To illustrate with
code, the example from the tutorial extends PyListObject (python:list)
using the following struct:

    typedef struct {
        PyListObject list;
        int state;
    } SubListObject;

This won't compile in the Limited API, since PyListObject is opaque (to
allow changes as features and optimizations are implemented).

Instead, this PEP proposes using a struct with only the state needed in
the subclass, that is:

    typedef struct {
        int state;
    } SubListState;

    // (or just `typedef int SubListState;` in this case)

The subclass can now be completely decoupled from the memory layout (and
size) of the superclass.

This is possible today. To use such a struct:

-   when creating the class, use
    PyListObject->tp_basicsize + sizeof(SubListState) as
    PyType_Spec.basicsize;
-   when accessing the data, use PyListObject->tp_basicsize as the
    offset into the instance (PyObject*).

However, this has disadvantages:

-   The base's basicsize may not be properly aligned, causing issues on
    some architectures if not mitigated. (These issues can be
    particularly nasty if alignment changes in a new release.)
-   PyTypeObject.tp_basicsize is not exposed in the Limited API, so
    extensions that support Limited API need to use
    PyObject_GetAttrString(obj, "__basicsize__"). This is cumbersome,
    and unsafe in edge cases (the Python attribute can be overridden).
-   Variable-size objects are not handled (see Extending variable-size
    objects below).

To make this easy (and even best practice for projects that choose loose
coupling over maximum performance), this PEP proposes an API to:

1.  During class creation, specify that SubListState should be
    “appended” to PyListObject, without passing any additional details
    about list. (The interpreter itself gets all necessary info, like
    tp_basicsize, from the base).

    This will be specified by a negative PyType_Spec.basicsize:
    -sizeof(SubListState).

2.  Given an instance, and the subclass PyTypeObject*, get a pointer to
    the SubListState. A new function, PyObject_GetTypeData, will be
    added for this.

The base class is not limited to PyListObject, of course: it can be used
to extend any base class whose instance struct is opaque, unstable
across releases, or not exposed at all -- including python:type
(PyHeapTypeObject) or third-party extensions (for example, NumPy
arrays[1]).

For cases where no additional state is needed, a zero basicsize will be
allowed: in that case, the base's tp_basicsize will be inherited. (This
currently works, but lacks explicit documentation and tests.)

The tp_basicsize of the new class will be set to the computed total
size, so code that inspects classes will continue working as before.

Extending variable-size objects

Additional considerations are needed to subclass
variable-sized objects <PyVarObject> while maintaining loose coupling:
the variable-sized data can collide with subclass data (SubListState in
the example above).

Currently, CPython doesn't provide a way to prevent such collisions. So,
the proposed mechanism of extending opaque classes (negative
base->tp_itemsize) will fail by default.

We could stop there, but since the motivating type --- PyHeapTypeObject
---is variable sized, we need a safe way to allow subclassing it. A bit
of background first:

Variable-size layouts

There are two main memory layouts for variable-sized objects.

In types such as int or tuple, the variable data is stored at a fixed
offset. If subclasses need additional space, it must be added after any
variable-sized data:

    PyTupleObject:
    ┌───────────────────┬───┬───┬╌╌╌╌┐
    │ PyObject_VAR_HEAD │var. data   │
    └───────────────────┴───┴───┴╌╌╌╌┘

    tuple subclass:
    ┌───────────────────┬───┬───┬╌╌╌╌┬─────────────┐
    │ PyObject_VAR_HEAD │var. data   │subclass data│
    └───────────────────┴───┴───┴╌╌╌╌┴─────────────┘

In other types, like PyHeapTypeObject, variable-sized data always lives
at the end of the instance's memory area:

    heap type:
    ┌───────────────────┬──────────────┬───┬───┬╌╌╌╌┐
    │ PyObject_VAR_HEAD │Heap type data│var. data   │
    └───────────────────┴──────────────┴───┴───┴╌╌╌╌┘

    type subclass:
    ┌───────────────────┬──────────────┬─────────────┬───┬───┬╌╌╌╌┐
    │ PyObject_VAR_HEAD │Heap type data│subclass data│var. data   │
    └───────────────────┴──────────────┴─────────────┴───┴───┴╌╌╌╌┘

The first layout enables fast access to the items array. The second
allows subclasses to ignore the variable-sized array (assuming they use
offsets from the start of the object to access their data).

Since this PEP focuses on PyHeapTypeObject, it proposes an API to allow
subclassing for the second variant. Support for the first can be added
later as an API-compatible change (though your PEP author doubts it'd be
worth the effort).

Extending classes with the PyHeapTypeObject-like layout

This PEP proposes a type flag, Py_TPFLAGS_ITEMS_AT_END, which will
indicate the PyHeapTypeObject-like layout. This can be set in two ways:

-   the superclass can set the flag, allowing subclass authors to not
    care about the fact that itemsize is involved, or
-   the new subclass sets the flag, asserting that the author knows the
    superclass is suitable (but perhaps hasn't been updated to use the
    flag yet).

This flag will be necessary to extend a variable-sized type using
negative basicsize.

An alternative to a flag would be to require subclass authors to know
that the base uses a compatible layout (e.g. from documentation). A past
version of this PEP proposed a new PyType_Slot for it. This turned out
to be hard to explain, and goes against the idea of decoupling the
subclass from the base layout.

The new flag will be used to allow safely extending variable-sized
types: creating a type with spec->basicsize < 0 and
base->tp_itemsize > 0 will require the flag.

Additionally, this PEP proposes a helper function to get the
variable-sized data of a given instance, if it uses the new
Py_TPFLAGS_ITEMS_AT_END flag. This hides the necessary pointer
arithmetic behind an API that can potentially be adapted to other
layouts in the future (including, potentially, a VM-managed layout).

Big picture

To make it easier to verify that all cases are covered, here's a
scary-looking big-picture decision tree.

Note

The individual cases are easier to explain in isolation (see the
reference implementation for draft docs).

-   spec->basicsize > 0: No change to the status quo. (The base class
    layout is known.)
-   spec->basicsize == 0: (Inheriting the basicsize)
    -   base->tp_itemsize == 0: The item size is set to
        spec->tp_itemsize. (No change to status quo.)
    -   base->tp_itemsize > 0: (Extending a variable-size class)
        -   spec->itemsize == 0: The item size is inherited. (No change
            to status quo.)
        -   spec->itemsize > 0: The item size is set. (This is hard to
            use safely, but it's CPython's current behavior.)
-   spec->basicsize < 0: (Extending the basicsize)
    -   base->tp_itemsize == 0: (Extending a fixed-size class)
        -   spec->itemsize == 0: The item size is set to 0.
        -   spec->itemsize > 0: Fail. (We'd need to add an ob_size,
            which is only possible for trivial types -- and the trivial
            layout must be known.)
    -   base->tp_itemsize > 0: (Extending a variable-size class)
        -   spec->itemsize == 0: (Inheriting the itemsize)
            -   Py_TPFLAGS_ITEMS_AT_END used: itemsize is inherited.
            -   Py_TPFLAGS_ITEMS_AT_END not used: Fail. (Possible
                conflict.)
        -   spec->itemsize > 0: Fail. (Changing/extending the item size
            can't be done safely.)

Setting spec->itemsize < 0 is always an error. This PEP does not propose
any mechanism to extend tp->itemsize rather than just inherit it.

Relative member offsets

One more piece of the puzzle is PyMemberDef.offset. Extensions that use
a subclass-specific struct (SubListState above) will get a way to
specify “relative” offsets (offsets based from this struct) rather than
“absolute” ones (based off the PyObject struct).

One way to do it would be to automatically assume “relative” offsets
when creating a class using the new API. However, this implicit
assumption would be too surprising.

To be more explicit, this PEP proposes a new flag for “relative”
offsets. At least initially, this flag will serve only as a check
against misuse (and a hint for reviewers). It must be present if used
with the new API, and must not be used otherwise.

Specification

In the code blocks below, only function headers are part of the
specification. Other code (the size/offset calculations) are details of
the initial CPython implementation, and subject to change.

Relative basicsize

The basicsize member of PyType_Spec will be allowed to be zero or
negative. In that case, its absolute value will specify how much extra
storage space instances of the new class require, in addition to the
basicsize of the base class. That is, the basicsize of the resulting
class will be:

    type->tp_basicsize = _align(base->tp_basicsize) + _align(-spec->basicsize);

where _align rounds up to a multiple of alignof(max_align_t).

When spec->basicsize is zero, basicsize will be inherited directly
instead, i.e. set to base->tp_basicsize without aligning. (This already
works; explicit tests and documentation will be added.)

On an instance, the memory area specific to a subclass -- that is, the
“extra space” that subclass reserves in addition its base -- will be
available through a new function, PyObject_GetTypeData. In CPython, this
function will be defined as:

    void *
    PyObject_GetTypeData(PyObject *obj, PyTypeObject *cls) {
        return (char *)obj + _align(cls->tp_base->tp_basicsize);
    }

Another function will be added to retrieve the size of this memory area:

    Py_ssize_t
    PyType_GetTypeDataSize(PyTypeObject *cls) {
        return cls->tp_basicsize - _align(cls->tp_base->tp_basicsize);
    }

The result may be higher than requested by -basicsize. It is safe to use
all of it (e.g. with memset).

The new *Get* functions come with an important caveat, which will be
pointed out in documentation: They may only be used for classes created
using negative PyType_Spec.basicsize. For other classes, their behavior
is undefined. (Note that this allows the above code to assume
cls->tp_base is not NULL.)

Inheriting itemsize

When spec->itemsize is zero, tp_itemsize will be inherited from the
base. (This already works; explicit tests and documentation will be
added.)

A new type flag, Py_TPFLAGS_ITEMS_AT_END, will be added. This flag can
only be set on types with non-zero tp_itemsize. It indicates that the
variable-sized portion of an instance is stored at the end of the
instance's memory.

The default metatype (PyType_Type) will set this flag.

A new function, PyObject_GetItemData, will be added to access the memory
reserved for variable-sized content of types with the new flag. In
CPython it will be defined as:

    void *
    PyObject_GetItemData(PyObject *obj) {
        if (!PyType_HasFeature(Py_TYPE(obj), Py_TPFLAGS_ITEMS_AT_END) {
            <fail with TypeError>
        }
        return (char *)obj + Py_TYPE(obj)->tp_basicsize;
    }

This function will initially not be added to the Limited API.

Extending a class with positive base->itemsize using negative
spec->basicsize will fail unless Py_TPFLAGS_ITEMS_AT_END is set, either
on the base or in spec->flags. (See Extending variable-size objects for
a full explanation.)

Extending a class with positive spec->itemsize using negative
spec->basicsize will fail.

Relative member offsets

In types defined using negative PyType_Spec.basicsize, the offsets of
members defined via Py_tp_members must be relative to the extra subclass
data, rather than the full PyObject struct. This will be indicated by a
new flag in PyMemberDef.flags: Py_RELATIVE_OFFSET.

In the initial implementation, the new flag will be redundant. It only
serves to make the offset's changed meaning clear, and to help avoid
mistakes. It will be an error to not use Py_RELATIVE_OFFSET with
negative basicsize, and it will be an error to use it in any other
context (i.e. direct or indirect calls to PyDescr_NewMember,
PyMember_GetOne, PyMember_SetOne).

CPython will adjust the offset and clear the Py_RELATIVE_OFFSET flag
when initializing a type. This means that:

-   the created type's tp_members will not match the input definition's
    Py_tp_members slot, and
-   any code that reads tp_members will not need to handle the flag.

List of new API

The following new functions/values are proposed.

These will be added to the Limited API/Stable ABI:

-   void * PyObject_GetTypeData(PyObject *obj, PyTypeObject *cls)
-   Py_ssize_t PyType_GetTypeDataSize(PyTypeObject *cls)
-   Py_TPFLAGS_ITEMS_AT_END flag for PyTypeObject.tp_flags
-   Py_RELATIVE_OFFSET flag for PyMemberDef.flags

These will be added to the public C API only:

-   void *PyObject_GetItemData(PyObject *obj)

Backwards Compatibility

No backwards compatibility concerns are known.

Assumptions

The implementation assumes that an instance's memory between
type->tp_base->tp_basicsize and type->tp_basicsize offsets “belongs” to
type (except variable-length types). This is not documented explicitly,
but CPython up to version 3.11 relied on it when adding __dict__ to
subclasses, so it should be safe.

Security Implications

None known.

Endorsements

The author of pybind11 originally requested solving the issue (see point
2 in this list), and has been verifying the implementation.

Florian from the HPy project said that the API looks good in general.
(See below for a possible solution to performance concerns.)

How to Teach This

The initial implementation will include reference documentation and a
What's New entry, which should be enough for the target audience --
authors of C extension libraries.

Reference Implementation

A reference implementation is in the extend-opaque branch in the
encukou/cpython GitHub repo.

Possible Future Enhancements

Alignment & Performance

The proposed implementation may waste some space if instance structs
need smaller alignment than alignof(max_align_t). Also, dealing with
alignment makes the calculation slower than it could be if we could rely
on base->tp_basicsize being properly aligned for the subtype.

In other words, the proposed implementation focuses on safety and ease
of use, and trades space and time for it. If it turns out that this is a
problem, the implementation can be adjusted without breaking the API:

-   The offset to the type-specific buffer can be stored, so
    PyObject_GetTypeData effectively becomes
    (char *)obj + cls->ht_typedataoffset, possibly speeding things up at
    the cost of an extra pointer in the class.
-   Then, a new PyType_Slot can specify the desired alignment, to reduce
    space requirements for instances.

Other layouts for variable-size types

A flag like Py_TPFLAGS_ITEMS_AT_END could be added to signal the
“tuple-like” layout described in Extending variable-size objects, and
all mechanisms this PEP proposes could be adapted to support it. Other
layouts could be added as well. However, it seems there'd be very little
practical benefit, so it's just a theoretical possibility.

Rejected Ideas

Instead of a negative spec->basicsize, a new PyType_Spec flag could've
been added. The effect would be the same to any existing code accessing
these internals without up to date knowledge of the change as the
meaning of the field value is changing in this situation.

Footnotes

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

[1] This PEP does not make it “safe” to subclass NumPy arrays
specifically. NumPy publishes an extensive list of caveats for
subclassing its arrays from Python, and extending in C might need a
similar list.