PEP: 733 Title: An Evaluation of Python's Public C API Author: Erlend
Egeberg Aasland <erlend@python.org>, Domenico Andreoli
<domenico.andreoli@linux.com>, Stefan Behnel <stefan_ml@behnel.de>, Carl
Friedrich Bolz-Tereick <cfbolz@gmx.de>, Simon Cross
<hodgestar@gmail.com>, Steve Dower <steve.dower@python.org>, Tim
Felgentreff <tim.felgentreff@oracle.com>, David Hewitt
<1939362+davidhewitt@users.noreply.github.com>, Shantanu Jain
<hauntsaninja at gmail.com>, Wenzel Jakob <wenzel.jakob@epfl.ch>, Irit
Katriel <irit@python.org>, Marc-Andre Lemburg <mal@lemburg.com>, Donghee
Na <donghee.na@python.org>, Karl Nelson <nelson85@llnl.gov>, Ronald
Oussoren <ronaldoussoren@mac.com>, Antoine Pitrou <solipsis@pitrou.net>,
Neil Schemenauer <nas@arctrix.com>, Mark Shannon <mark@hotpy.org>,
Stepan Sindelar <stepan.sindelar@oracle.com>, Gregory P. Smith
<greg@krypto.org>, Eric Snow <ericsnowcurrently@gmail.com>, Victor
Stinner <vstinner@python.org>, Guido van Rossum <guido@python.org>, Petr
Viktorin <encukou@gmail.com>, Carol Willing <willingc@gmail.com>,
William Woodruff <william@yossarian.net>, David Woods
<dw-git@d-woods.co.uk>, Jelle Zijlstra <jelle.zijlstra@gmail.com>,
Status: Draft Type: Informational Created: 16-Oct-2023 Post-History:
01-Nov-2023

Abstract

This informational PEP describes our shared view of the public C API.
The document defines:

-   purposes of the C API
-   stakeholders and their particular use cases and requirements
-   strengths of the C API
-   problems of the C API categorized into nine areas of weakness

This document does not propose solutions to any of the identified
problems. By creating a shared list of C API issues, this document will
help to guide continuing discussion about change proposals and to
identify evaluation criteria.

Introduction

Python's C API was not designed for the different purposes it currently
fulfills. It evolved from what was initially the internal API between
the C code of the interpreter and the Python language and libraries. In
its first incarnation, it was exposed to make it possible to embed
Python into C/C++ applications and to write extension modules in C/C++.
These capabilities were instrumental to the growth of Python's
ecosystem. Over the decades, the C API grew to provide different tiers
of stability, conventions changed, and new usage patterns have emerged,
such as bindings to languages other than C/C++. In the next few years,
new developments are expected to further test the C API, such as the
removal of the GIL and the development of a JIT compiler. However, this
growth was not supported by clearly documented guidelines, resulting in
inconsistent approaches to API design in different subsystems of
CPython. In addition, CPython is no longer the only implementation of
Python, and some of the design decisions made when it was, are difficult
for alternative implementations to work with [Issue 64]. In the
meantime, lessons were learned and mistakes in both the design and the
implementation of the C API were identified.

Evolving the C API is hard due to the combination of backwards
compatibility constraints and its inherent complexity, both technical
and social. Different types of users bring different, sometimes
conflicting, requirements. The tradeoff between stability and progress
is an ongoing, highly contentious topic of discussion when suggestions
are made for incremental improvements. Several proposals have been put
forward for improvement, redesign or replacement of the C API, each
representing a deep analysis of the problems. At the 2023 Language
Summit, three back-to-back sessions were devoted to different aspects of
the C API. There is general agreement that a new design can remedy the
problems that the C API has accumulated over the last 30 years, while at
the same time updating it for use cases that it was not originally
designed for.

However, there was also a sense at the Language Summit that we are
trying to discuss solutions without a clear common understanding of the
problems that we are trying to solve. We decided that we need to agree
on the current problems with the C API, before we are able to evaluate
any of the proposed solutions. We therefore created the capi-workgroup
repository on GitHub in order to collect everyone's ideas on that
question.

Over 60 different issues were created on that repository, each
describing a problem with the C API. We categorized them and identified
a number of recurring themes. The sections below mostly correspond to
these themes, and each contains a combined description of the issues
raised in that category, along with links to the individual issues. In
addition, we included a section that aims to identify the different
stakeholders of the C API, and the particular requirements that each of
them has.

C API Stakeholders

As mentioned in the introduction, the C API was originally created as
the internal interface between CPython's interpreter and the Python
layer. It was later exposed as a way for third-party developers to
extend and embed Python programs. Over the years, new types of
stakeholders emerged, with different requirements and areas of focus.
This section describes this complex state of affairs in terms of the
actions that different stakeholders need to perform through the C API.

Common Actions for All Stakeholders

There are actions which are generic, and required by all types of API
users:

-   Define functions and call them
-   Define new types
-   Create instances of builtin and user-defined types
-   Perform operations on object instances
-   Introspect objects, including types, instances, and functions
-   Raise and handle exceptions
-   Import modules
-   Access to Python's OS interface

The following sections look at the unique requirements of various
stakeholders.

Extension Writers

Extension writers are the traditional users of the C API. Their
requirements are the common actions listed above. They also commonly
need to:

-   Create new modules
-   Efficiently interface between modules at the C level

Authors of Embedded Python Applications

Applications with an embedded Python interpreter. Examples are Blender
and OBS.

They need to be able to:

-   Configure the interpreter (import paths, inittab, sys.argv, memory
    allocator, etc.).
-   Interact with the execution model and program lifetime, including
    clean interpreter shutdown and restart.
-   Represent complex data models in a way Python can use without having
    to create deep copies.
-   Provide and import frozen modules.
-   Run and manage multiple independent interpreters (in particular,
    when embedded in a library that wants to avoid global effects).

Python Implementations

Python implementations such as CPython, PyPy, GraalPy, IronPython,
RustPython, MicroPython, and Jython), may take very different approaches
for the implementation of different subsystems. They need:

-   The API to be abstract and hide implementation details.
-   A specification of the API, ideally with a test suite that ensures
    compatibility.
-   It would be nice to have an ABI that can be shared across Python
    implementations.

Alternative APIs and Binding Generators

There are several projects that implement alternatives to the C API,
which offer extension users advantanges over programming directly with
the C API. These APIs are implemented with the C API, and in some cases
by using CPython internals.

There are also libraries that create bindings between Python and other
object models, paradigms or languages.

There is overlap between these categories: binding generators usually
provide alternative APIs, and vice versa.

Examples are Cython, cffi, pybind11 and nanobind for C++, PyO3 for Rust,
Shiboken used by PySide for Qt, PyGObject for GTK, Pygolo for Go, JPype
for Java, PyJNIus for Android, PyObjC for Objective-C, SWIG for C/C++,
Python.NET for .NET (C#), HPy, Mypyc, Pythran and pythoncapi-compat.
CPython's DSL for parsing function arguments, the Argument Clinic, can
also be seen as belonging to this category of stakeholders.

Alternative APIs need minimal building blocks for accessing CPython
efficiently. They don't necessarily need an ergonomic API, because they
typically generate code that is not intended to be read by humans. But
they do need it to be comprehensive enough so that they can avoid
accessing internals, without sacrificing performance.

Binding generators often need to:

-   Create custom objects (e.g. function/module objects and traceback
    entries) that match the behavior of equivalent Python code as
    closely as possible.
-   Dynamically create objects which are static in traditional C
    extensions (e.g. classes/modules), and need CPython to manage their
    state and lifetime.
-   Dynamically adapt foreign objects (strings, GC'd containers), with
    low overhead.
-   Adapt external mechanisms, execution models and guarantees to the
    Python way (stackful coroutines, continuations,
    one-writer-or-multiple-readers semantics, virtual multiple
    inheritance, 1-based indexing, super-long inheritance chains,
    goroutines, channels, etc.).

These tools might also benefit from a choice between a more stable and a
faster (possibly lower-level) API. Their users could then decide whether
they can afford to regenerate the code often or trade some performance
for more stability and less maintenance work.

Strengths of the C API

While the bulk of this document is devoted to problems with the C API
that we would like to see fixed in any new design, it is also important
to point out the strengths of the C API, and to make sure that they are
preserved.

As mentioned in the introduction, the C API enabled the development and
growth of the Python ecosystem over the last three decades, while
evolving to support use cases that it was not originally designed for.
This track record in itself is indication of how effective and valuable
it has been.

A number of specific strengths were mentioned in the capi-workgroup
discussions. Heap types were identified as much safer and easier to use
than static types [Issue 4].

API functions that take a C string literal for lookups based on a Python
string are very convenient [Issue 30].

The limited API demonstrates that an API which hides implementation
details makes it easier to evolve Python [Issue 30].

C API problems

The remainder of this document summarizes and categorizes the problems
that were reported on the capi-workgroup repository. The issues are
grouped into several categories.

API Evolution and Maintenance

The difficulty of making changes in the C API is central to this report.
It is implicit in many of the issues we discuss here, particularly when
we need to decide whether an incremental bugfix can resolve the issue,
or whether it can only be addressed as part of an API redesign [Issue
44]. The benefit of each incremental change is often viewed as too small
to justify the disruption. Over time, this implies that every mistake we
make in an API's design or implementation remains with us indefinitely.

We can take two views on this issue. One is that this is a problem and
the solution needs to be baked into any new C API we design, in the form
of a process for incremental API evolution, which includes deprecation
and removal of API elements. The other possible approach is that this is
not a problem to be solved, but rather a feature of any API. In this
view, API evolution should not be incremental, but rather through large
redesigns, each of which learns from the mistakes of the past and is not
shackled by backwards compatibility requirements (in the meantime, new
API elements may be added, but nothing can ever be removed). A
compromise approach is somewhere between these two extremes, fixing
issues which are easy or important enough to tackle incrementally, and
leaving others alone.

The problem we have in CPython is that we don't have an agreed, official
approach to API evolution. Different members of the core team are
pulling in different directions and this is an ongoing source of
disagreements. Any new C API needs to come with a clear decision about
the model that its maintenance will follow, as well as the technical and
organizational processes by which this will work.

If the model does include provisions for incremental evolution of the
API, it will include processes for managing the impact of the change on
users [Issue 60], perhaps through introducing an external backwards
compatibility module [Issue 62], or a new API tier of "blessed"
functions [Issue 55].

API Specification and Abstraction

The C API does not have a formal specification, it is currently defined
as whatever the reference implementation (CPython) contains in a
particular version. The documentation acts as an incomplete description,
which is not sufficient for verifying the correctness of either the full
API, the limited API, or the stable ABI. As a result, the C API may
change significantly between releases without needing a more visible
specification update, and this leads to a number of problems.

Bindings for languages other than C/C++ must parse C code [Issue 7].
Some C language features are hard to handle in this way, because they
produce compiler-dependent output (such as enums) or require a C
preprocessor/compiler rather than just a parser (such as macros) [Issue
35].

Furthermore, C header files tend to expose more than what is intended to
be part of the public API [Issue 34]. In particular, implementation
details such as the precise memory layouts of internal data structures
can be exposed [Issue 22 and PEP 620]. This can make API evolution very
difficult, in particular when it occurs in the stable ABI as in the case
of ob_refcnt and ob_type, which are accessed via the reference counting
macros [Issue 45].

We identified a deeper issue in relation to the way that reference
counting is exposed. The way that C extensions are required to manage
references with calls to Py_INCREF and Py_DECREF is specific to
CPython's memory model, and is hard for alternative Python
implementations to emulate. [Issue 12].

Another set of problems arises from the fact that a PyObject* is exposed
in the C API as an actual pointer rather than a handle. The address of
an object serves as its ID and is used for comparison, and this
complicates matters for alternative Python implementations that move
objects during GC [Issue 37].

A separate issue is that object references are opaque to the runtime,
discoverable only through calls to tp_traverse/tp_clear, which have
their own purposes. If there was a way for the runtime to know the
structure of the object graph, and keep up with changes in it, this
would make it possible for alternative implementations to implement
different memory management schemes [Issue 33].

Object Reference Management

There does not exist a consistent naming convention for functions which
makes their reference semantics obvious, and this leads to error prone C
API functions, where they do not follow the typical behaviour. When a C
API function returns a PyObject*, the caller typically gains ownership
of a reference to the object. However, there are exceptions where a
function returns a "borrowed" reference, which the caller can access but
does not own a reference to. Similarly, functions typically do not
change the ownership of references to their arguments, but there are
exceptions where a function "steals" a reference, i.e., the ownership of
the reference is permanently transferred from the caller to the callee
by the call [Issue 8 and Issue 52]. The terminology used to describe
these situations in the documentation can also be improved [Issue 11].

A more radical change is necessary in the case of functions that return
"borrowed" references (such as PyList_GetItem) [Issue 5 and Issue 21] or
pointers to parts of the internal structure of an object (such as
PyBytes_AsString) [Issue 57]. In both cases, the reference/pointer is
valid for as long as the owning object holds the reference, but this
time is hard to reason about. Such functions should not exist in the API
without a mechanism that can make them safe.

For containers, the API is currently missing bulk operations on the
references of contained objects. This is particularly important for a
stable ABI where INCREF and DECREF cannot be macros, making bulk
operations expensive when implemented as a sequence of function calls
[Issue 15].

Type Definition and Object Creation

The C API has functions that make it possible to create incomplete or
inconsistent Python objects, such as PyTuple_New and PyUnicode_New. This
causes problems when the object is tracked by GC or its
tp_traverse/tp_clear functions are called. A related issue is with
functions such as PyTuple_SetItem which is used to modify a partially
initialized tuple (tuples are immutable once fully initialized) [Issue
56].

We identified a few issues with type definition APIs. For legacy
reasons, there is often a significant amount of code duplication between
tp_new and tp_vectorcall [Issue 24]. The type slot function should be
called indirectly, so that their signatures can change to include
context information [Issue 13]. Several aspects of the type definition
and creation process are not well defined, such as which stage of the
process is responsible for initializing and clearing different fields of
the type object [Issue 49].

Error Handling

Error handling in the C API is based on the error indicator which is
stored on the thread state (in global scope). The design intention was
that each API function returns a value indicating whether an error has
occurred (by convention, -1 or NULL). When the program knows that an
error occurred, it can fetch the exception object which is stored in the
error indicator. We identified a number of problems which are related to
error handling, pointing at APIs which are too easy to use incorrectly.

There are functions that do not report all errors that occur while they
execute. For example, PyDict_GetItem clears any errors that occur when
it calls the key's hash function, or while performing a lookup in the
dictionary [Issue 51].

Python code never executes with an in-flight exception (by definition),
and typically code using native functions should also be interrupted by
an error being raised. This is not checked in most C API functions, and
there are places in the interpreter where error handling code calls a C
API function while an exception is set. For example, see the call to
PyUnicode_FromString in the error handler of _PyErr_WriteUnraisableMsg
[Issue 2].

There are functions that do not return a value, so a caller is forced to
query the error indicator in order to identify whether an error has
occurred. An example is PyBuffer_Release [Issue 20]. There are other
functions which do have a return value, but this return value does not
unambiguously indicate whether an error has occurred. For example,
PyLong_AsLong returns -1 in case of error, or when the value of the
argument is indeed -1 [Issue 1]. In both cases, the API is error prone
because it is possible that the error indicator was already set before
the function was called, and the error is incorrectly attributed. The
fact that the error was not detected before the call is a bug in the
calling code, but the behaviour of the program in this case doesn't make
it easy to identify and debug the problem.

There are functions that take a PyObject* argument, with special meaning
when it is NULL. For example, if PyObject_SetAttr receives NULL as the
value to set, this means that the attribute should be cleared. This is
error prone because it could be that NULL indicates an error in the
construction of the value, and the program failed to check for this
error. The program will misinterpret the NULL to mean something
different than error [Issue 47].

API Tiers and Stability Guarantees

The different API tiers provide different tradeoffs of stability vs API
evolution, and sometimes performance.

The stable ABI was identified as an area that needs to be looked into.
At the moment it is incomplete and not widely adopted. At the same time,
its existence is making it hard to make changes to some implementation
details, because it exposes struct fields such as ob_refcnt, ob_type and
ob_size. There was some discussion about whether the stable ABI is worth
keeping. Arguments on both sides can be found in [Issue 4] and [Issue
9].

Alternatively, it was suggested that in order to be able to evolve the
stable ABI, we need a mechanism to support multiple versions of it in
the same Python binary. It was pointed out that versioning individual
functions within a single ABI version is not enough because it may be
necessary to evolve, together, a group of functions that interoperate
with each other [Issue 39].

The limited API was introduced in 3.2 as a blessed subset of the C API
which is recommended for users who would like to restrict themselves to
high quality APIs which are not likely to change often. The
Py_LIMITED_API flag allows users to restrict their program to older
versions of the limited API, but we now need the opposite option, to
exclude older versions. This would make it possible to evolve the
limited API by replacing flawed elements in it [Issue 54]. More
generally, in a redesign we should revisit the way that API tiers are
specified and consider designing a method that will unify the way we
currently select between the different tiers [Issue 59].

API elements whose names begin with an underscore are considered
private, essentially an API tier with no stability guarantees. However,
this was only clarified recently, in PEP 689. It is not clear what the
change policy should be with respect to such API elements that predate
PEP 689 [Issue 58].

There are API functions which have an unsafe (but fast) version as well
as a safe version which performs error checking (for example,
PyTuple_GET_ITEM vs PyTuple_GetItem). It may help to be able to group
them into their own tiers - the "unsafe API" tier and the "safe API"
tier [Issue 61].

Use of the C Language

A number of issues were raised with respect to the way that CPython uses
the C language. First there is the issue of which C dialect we use, and
how we test our compatibility with it, as well as API header
compatibility with C++ dialects [Issue 42].

Usage of const in the API is currently sparse, but it is not clear
whether this is something that we should consider changing [Issue 38].

We currently use the C types long and int, where fixed-width integers
such as int32_t and int64_t may now be better choices [Issue 27].

We are using C language features which are hard for other languages to
interact with, such as macros, variadic arguments, enums, bitfields, and
non-function symbols [Issue 35].

There are API functions that take a PyObject* arg which must be of a
more specific type (such as PyTuple_Size, which fails if its arg is not
a PyTupleObject*). It is an open question whether this is a good pattern
to have, or whether the API should expect the more specific type [Issue
31].

There are functions in the API that take concrete types, such as
PyDict_GetItemString which performs a dictionary lookup for a key
specified as a C string rather than PyObject*. At the same time, for
PyDict_ContainsString it is not considered appropriate to add a concrete
type alternative. The principle around this should be documented in the
guidelines [Issue 23].

Implementation Flaws

Below is a list of localized implementation flaws. Most of these can
probably be fixed incrementally, if we choose to do so. They should, in
any case, be avoided in any new API design.

There are functions that don't follow the convention of returning 0 for
success and -1 for failure. For example, PyArg_ParseTuple returns 0 for
success and non-zero for failure [Issue 25].

The macros Py_CLEAR and Py_SETREF access their arg more than once, so if
the arg is an expression with side effects, they are duplicated [Issue
3].

The meaning of Py_SIZE depends on the type and is not always reliable
[Issue 10].

Some API function do not have the same behaviour as their Python
equivalents. The behaviour of PyIter_Next is different from tp_iternext.
[Issue 29]. The behaviour of PySet_Contains is different from
set.__contains__ [Issue 6].

The fact that PyArg_ParseTupleAndKeywords takes a non-const char* array
as argument makes it more difficult to use [Issue 28].

Python.h does not expose the whole API. Some headers (like marshal.h)
are not included from Python.h. [Issue 43].

Naming

PyLong and PyUnicode use names which no longer match the Python types
they represent (int/str). This could be fixed in a new API [Issue 14].

There are identifiers in the API which are lacking a Py/_Py prefix
[Issue 46].

Missing Functionality

This section consists of a list of feature requests, i.e., functionality
that was identified as missing in the current C API.

Debug Mode

A debug mode that can be activated without recompilation and which
activates various checks that can help detect various types of errors
[Issue 36].

Introspection

There aren't currently reliable introspection capabilities for objects
defined in C in the same way as there are for Python objects [Issue 32].

Efficient type checking for heap types [Issue 17].

Improved Interaction with Other Languages

Interfacing with other GC based languages, and integrating their GC with
Python's GC [Issue 19].

Inject foreign stack frames to the traceback [Issue 18].

Concrete strings that can be used in other languages [Issue 16].

References

1.  Python/C API Reference Manual
2.  2023 Language Summit Blog Post: Three Talks on the C API
3.  capi-workgroup on GitHub
4.  Irit's Core Sprint 2023 slides about C API workgroup
5.  Petr's Core Sprint 2023 slides
6.  HPy team's Core Sprint 2023 slides for Things to Learn from HPy
7.  Victor's slides of Core Sprint 2023 Python C API talk
8.  The Python's stability promise — Cristián Maureira-Fredes, PySide
    maintainer
9.  Report on the issues PySide had 5 years ago when switching to the
    stable ABI

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.