PEP: 688 Title: Making the buffer protocol accessible in Python Author:
Jelle Zijlstra <jelle.zijlstra@gmail.com> Discussions-To:
https://discuss.python.org/t/19756 Status: Final Type: Standards Track
Topic: Typing Created: 23-Apr-2022 Python-Version: 3.12 Post-History:
23-Apr-2022, 25-Apr-2022, 06-Oct-2022, 26-Oct-2022 Resolution:
07-Mar-2023

python:python-buffer-protocol

Abstract

This PEP proposes a Python-level API for the buffer protocol, which is
currently accessible only to C code. This allows type checkers to
evaluate whether objects implement the protocol.

Motivation

The CPython C API provides a versatile mechanism for accessing the
underlying memory of an object—the buffer protocol introduced in PEP
3118. Functions that accept binary data are usually written to handle
any object implementing the buffer protocol. For example, at the time of
writing, there are around 130 functions in CPython using the Argument
Clinic Py_buffer type, which accepts the buffer protocol.

Currently, there is no way for Python code to inspect whether an object
supports the buffer protocol. Moreover, the static type system does not
provide a type annotation to represent the protocol. This is a common
problem when writing type annotations for code that accepts generic
buffers.

Similarly, it is impossible for a class written in Python to support the
buffer protocol. A buffer class in Python would give users the ability
to easily wrap a C buffer object, or to test the behavior of an API that
consumes the buffer protocol. Granted, this is not a particularly common
need. However, there has been a CPython feature request for supporting
buffer classes written in Python that has been open since 2012.

Rationale

Current options

There are two known workarounds for annotating buffer types in the type
system, but neither is adequate.

First, the current workaround for buffer types in typeshed is a type
alias that lists well-known buffer types in the standard library, such
as bytes, bytearray, memoryview, and array.array. This approach works
for the standard library, but it does not extend to third-party buffer
types.

Second, the documentation for typing.ByteString currently states:

  This type represents the types bytes, bytearray, and memoryview of
  byte sequences.

  As a shorthand for this type, bytes can be used to annotate arguments
  of any of the types mentioned above.

Although this sentence has been in the documentation since 2015, the use
of bytes to include these other types is not specified in any of the
typing PEPs. Furthermore, this mechanism has a number of problems. It
does not include all possible buffer types, and it makes the bytes type
ambiguous in type annotations. After all, there are many operations that
are valid on bytes objects, but not on memoryview objects, and it is
perfectly possible for a function to accept bytes but not memoryview
objects. A mypy user reports that this shortcut has caused significant
problems for the psycopg project.

Kinds of buffers

The C buffer protocol supports many options, affecting strides,
contiguity, and support for writing to the buffer. Some of these options
would be useful in the type system. For example, typeshed currently
provides separate type aliases for writable and read-only buffers.

However, in the C buffer protocol, most of these options cannot be
queried directly on the type object. The only way to figure out whether
an object supports a particular flag is to actually ask for the buffer.
For some types, such as memoryview, the supported flags depend on the
instance. As a result, it would be difficult to represent support for
these flags in the type system.

Specification

Python-level buffer protocol

We propose to add two Python-level special methods, __buffer__ and
__release_buffer__. Python classes that implement these methods are
usable as buffers from C code. Conversely, classes implemented in C that
support the buffer protocol acquire synthesized methods accessible from
Python code.

The __buffer__ method is called to create a buffer from a Python object,
for example by the memoryview() constructor. It corresponds to the
bf_getbuffer C slot. The Python signature for this method is
def __buffer__(self, flags: int, /) -> memoryview: .... The method must
return a memoryview object. If the bf_getbuffer slot is invoked on a
Python class with a __buffer__ method, the interpreter extracts the
underlying Py_buffer from the memoryview returned by the method and
returns it to the C caller. Similarly, if Python code calls the
__buffer__ method on an instance of a C class that implements
bf_getbuffer, the returned buffer is wrapped in a memoryview for
consumption by Python code.

The __release_buffer__ method should be called when a caller no longer
needs the buffer returned by __buffer__. It corresponds to the
bf_releasebuffer C slot. This is an optional part of the buffer
protocol. The Python signature for this method is
def __release_buffer__(self, buffer: memoryview, /) -> None: .... The
buffer to be released is wrapped in a memoryview. When this method is
invoked through CPython's buffer API (for example, through calling
memoryview.release on a memoryview returned by __buffer__), the passed
memoryview is the same object as was returned by __buffer__. It is also
possible to call __release_buffer__ on a C class that implements
bf_releasebuffer.

If __release_buffer__ exists on an object, Python code that calls
__buffer__ directly on the object must call __release_buffer__ on the
same object when it is done with the buffer. Otherwise, resources used
by the object may not be reclaimed. Similarly, it is a programming error
to call __release_buffer__ without a previous call to __buffer__, or to
call it multiple times for a single call to __buffer__. For objects that
implement the C buffer protocol, calls to __release_buffer__ where the
argument is not a memoryview wrapping the same object will raise an
exception. After a valid call to __release_buffer__, the memoryview is
invalidated (as if its release() method had been called), and any
subsequent calls to __release_buffer__ with the same memoryview will
raise an exception. The interpreter will ensure that misuse of the
Python API will not break invariants at the C level -- for example, it
will not cause memory safety violations.

inspect.BufferFlags

To help implementations of __buffer__, we add inspect.BufferFlags, a
subclass of enum.IntFlag. This enum contains all flags defined in the C
buffer protocol. For example, inspect.BufferFlags.SIMPLE has the same
value as the PyBUF_SIMPLE constant.

collections.abc.Buffer

We add a new abstract base classes, collections.abc.Buffer, which
requires the __buffer__ method. This class is intended primarily for use
in type annotations:

    def need_buffer(b: Buffer) -> memoryview:
        return memoryview(b)

    need_buffer(b"xy")  # ok
    need_buffer("xy")  # rejected by static type checkers

It can also be used in isinstance and issubclass checks:

    >>> from collections.abc import Buffer
    >>> isinstance(b"xy", Buffer)
    True
    >>> issubclass(bytes, Buffer)
    True
    >>> issubclass(memoryview, Buffer)
    True
    >>> isinstance("xy", Buffer)
    False
    >>> issubclass(str, Buffer)
    False

In the typeshed stub files, the class should be defined as a Protocol,
following the precedent of other simple ABCs in collections.abc such as
collections.abc.Iterable or collections.abc.Sized.

Example

The following is an example of a Python class that implements the buffer
protocol:

    import contextlib
    import inspect

    class MyBuffer:
        def __init__(self, data: bytes):
            self.data = bytearray(data)
            self.view = None

        def __buffer__(self, flags: int) -> memoryview:
            if flags != inspect.BufferFlags.FULL_RO:
                raise TypeError("Only BufferFlags.FULL_RO supported")
            if self.view is not None:
                raise RuntimeError("Buffer already held")
            self.view = memoryview(self.data)
            return self.view

        def __release_buffer__(self, view: memoryview) -> None:
            assert self.view is view  # guaranteed to be true
            self.view.release()
            self.view = None

        def extend(self, b: bytes) -> None:
            if self.view is not None:
                raise RuntimeError("Cannot extend held buffer")
            self.data.extend(b)

    buffer = MyBuffer(b"capybara")
    with memoryview(buffer) as view:
        view[0] = ord("C")

        with contextlib.suppress(RuntimeError):
            buffer.extend(b"!")  # raises RuntimeError

    buffer.extend(b"!")  # ok, buffer is no longer held

    with memoryview(buffer) as view:
        assert view.tobytes() == b"Capybara!"

Equivalent for older Python versions

New typing features are usually backported to older Python versions in
the typing_extensions package. Because the buffer protocol is currently
accessible only in C, this PEP cannot be fully implemented in a
pure-Python package like typing_extensions. As a temporary workaround,
an abstract base class typing_extensions.Buffer will be provided for
Python versions that do not have collections.abc.Buffer available.

After this PEP is implemented, inheriting from collections.abc.Buffer
will not be necessary to indicate that an object supports the buffer
protocol. However, in older Python versions, it will be necessary to
explicitly inherit from typing_extensions.Buffer to indicate to type
checkers that a class supports the buffer protocol, since objects
supporting the buffer protocol will not have a __buffer__ method. It is
expected that this will happen primarily in stub files, because buffer
classes are necessarily implemented in C code, which cannot have types
defined inline. For runtime uses, the ABC.register API can be used to
register buffer classes with typing_extensions.Buffer.

No special meaning for bytes

The special case stating that bytes may be used as a shorthand for other
ByteString types will be removed from the typing documentation. With
collections.abc.Buffer available as an alternative, there will be no
good reason to allow bytes as a shorthand. Type checkers currently
implementing this behavior should deprecate and eventually remove it.

Backwards Compatibility

__buffer__ and __release_buffer__ attributes

As the runtime changes in this PEP only add new functionality, there are
few backwards compatibility concerns.

However, code that uses a __buffer__ or __release_buffer__ attribute for
other purposes may be affected. While all dunders are technically
reserved for the language, it is still good practice to ensure that a
new dunder does not interfere with too much existing code, especially
widely used packages. A survey of publicly accessible code found:

-   PyPy supports a __buffer__ method with compatible semantics to those
    proposed in this PEP. A PyPy core developer expressed his support
    for this PEP.
-   pyzmq implements a PyPy-compatible __buffer__ method.
-   mpi4py defines a SupportsBuffer protocol that would be equivalent to
    this PEP's collections.abc.Buffer.
-   NumPy used to have an undocumented behavior where it would access a
    __buffer__ attribute (not method) to get an object's buffer. This
    was removed in 2019 for NumPy 1.17. The behavior would have last
    worked in NumPy 1.16, which only supported Python 3.7 and older.
    Python 3.7 will have reached its end of life by the time this PEP is
    expected to be implemented.

Thus, this PEP's use of the __buffer__ method will improve
interoperability with PyPy and not interfere with the current versions
of any major Python packages.

No publicly accessible code uses the name __release_buffer__.

Removal of the bytes special case

Separately, the recommendation to remove the special behavior for bytes
in type checkers does have a backwards compatibility impact on their
users. An experiment with mypy shows that several major open source
projects that use it for type checking will see new errors if the bytes
promotion is removed. Many of these errors can be fixed by improving the
stubs in typeshed, as has already been done for the builtins, binascii,
pickle, and re modules. A review of all usage of bytes types in typeshed
is in progress. Overall, the change improves type safety and makes the
type system more consistent, so we believe the migration cost is worth
it.

How to Teach This

We will add notes pointing to collections.abc.Buffer in appropriate
places in the documentation, such as typing.readthedocs.io and the mypy
cheat sheet. Type checkers may provide additional pointers in their
error messages. For example, when they encounter a buffer object being
passed to a function that is annotated to only accept bytes, the error
message could include a note suggesting the use of
collections.abc.Buffer instead.

Reference Implementation

An implementation of this PEP is available in the author's fork.

Rejected Ideas

types.Buffer

An earlier version of this PEP proposed adding a new types.Buffer type
with an __instancecheck__ implemented in C so that isinstance() checks
can be used to check whether a type implements the buffer protocol. This
avoids the complexity of exposing the full buffer protocol to Python
code, while still allowing the type system to check for the buffer
protocol.

However, that approach does not compose well with the rest of the type
system, because types.Buffer would be a nominal type, not a structural
one. For example, there would be no way to represent "an object that
supports both the buffer protocol and __len__". With the current
proposal, __buffer__ is like any other special method, so a Protocol can
be defined combining it with another method.

More generally, no other part of Python works like the proposed
types.Buffer. The current proposal is more consistent with the rest of
the language, where C-level slots usually have corresponding
Python-level special methods.

Keep bytearray compatible with bytes

It has been suggested to remove the special case where memoryview is
always compatible with bytes, but keep it for bytearray, because the two
types have very similar interfaces. However, several standard library
functions (e.g., re.compile, socket.getaddrinfo, and most functions
accepting path-like arguments) accept bytes but not bytearray. In most
codebases, bytearray is also not a very common type. We prefer to have
users spell out accepted types explicitly (or use Protocol from PEP 544
if only a specific set of methods is required). This aspect of the
proposal was specifically discussed on the typing-sig mailing list,
without any strong disagreement from the typing community.

Distinguish between mutable and immutable buffers

The most frequently used distinction within buffer types is whether or
not the buffer is mutable. Some functions accept only mutable buffers
(e.g., bytearray, some memoryview objects), others accept all buffers.

An earlier version of this PEP proposed using the presence of the
bf_releasebuffer slot to determine whether a buffer type is mutable.
This rule holds for most standard library buffer types, but the
relationship between mutability and the presence of this slot is not
absolute. For example, numpy arrays are mutable but do not have this
slot.

The current buffer protocol does not provide any way to reliably
determine whether a buffer type represents a mutable or immutable
buffer. Therefore, this PEP does not add type system support for this
distinction. The question can be revisited in the future if the buffer
protocol is enhanced to provide static introspection support. A sketch
for such a mechanism exists.

Acknowledgments

Many people have provided useful feedback on drafts of this PEP. Petr
Viktorin has been particularly helpful in improving my understanding of
the subtleties of the buffer protocol.

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.