PEP 688 – Making the buffer protocol accessible in Python
- Jelle Zijlstra <jelle.zijlstra at gmail.com>
- Discourse thread
- Standards Track
- 23-Apr-2022, 25-Apr-2022, 06-Oct-2022, 26-Oct-2022
- Discourse message
Table of Contents
- Backwards Compatibility
- How to Teach This
- Reference Implementation
- Rejected Ideas
This PEP proposes a Python-level API for the buffer protocol, which is currently accessible only to C code. This allows type checkers to evaluate whether objects implement the protocol.
The CPython C API provides a versatile mechanism for accessing the
underlying memory of an object—the buffer protocol
introduced in PEP 3118.
Functions that accept binary data are usually written to handle any
object implementing the buffer protocol. For example, at the time of writing,
there are around 130 functions in CPython using the Argument Clinic
Py_buffer type, which accepts the buffer protocol.
Currently, there is no way for Python code to inspect whether an object supports the buffer protocol. Moreover, the static type system does not provide a type annotation to represent the protocol. This is a common problem when writing type annotations for code that accepts generic buffers.
Similarly, it is impossible for a class written in Python to support the buffer protocol. A buffer class in Python would give users the ability to easily wrap a C buffer object, or to test the behavior of an API that consumes the buffer protocol. Granted, this is not a particularly common need. However, there has been a CPython feature request for supporting buffer classes written in Python that has been open since 2012.
There are two known workarounds for annotating buffer types in the type system, but neither is adequate.
First, the current workaround
for buffer types in typeshed is a type alias
that lists well-known buffer types in the standard library, such as
approach works for the standard library, but it does not extend to
third-party buffer types.
Second, the documentation
typing.ByteString currently states:
This type represents the types
memoryviewof byte sequences.
As a shorthand for this type,
bytescan be used to annotate arguments of any of the types mentioned above.
Although this sentence has been in the documentation
the use of
bytes to include these other types is not specified
in any of the typing PEPs. Furthermore, this mechanism has a number of
problems. It does not include all possible buffer types, and it
bytes type ambiguous in type annotations. After all,
there are many operations that are valid on
bytes objects, but
memoryview objects, and it is perfectly possible for
a function to accept
bytes but not
A mypy user
that this shortcut has caused significant problems for the
Kinds of buffers
The C buffer protocol supports many options, affecting strides, contiguity, and support for writing to the buffer. Some of these options would be useful in the type system. For example, typeshed currently provides separate type aliases for writable and read-only buffers.
However, in the C buffer protocol, most of these options cannot be
queried directly on the type object. The only way to figure out
whether an object supports a particular flag is to actually
ask for the buffer. For some types, such as
the supported flags depend on the instance. As a result, it would
be difficult to represent support for these flags in the type system.
Python-level buffer protocol
We propose to add two Python-level special methods,
classes that implement these methods are usable as buffers from C
code. Conversely, classes implemented in C that support the
buffer protocol acquire synthesized methods accessible from Python
__buffer__ method is called to create a buffer from a Python
object, for example by the
It corresponds to the
bf_getbuffer C slot.
The Python signature for this method is
def __buffer__(self, flags: int, /) -> memoryview: .... The method
must return a
memoryview object. If the
is invoked on a Python class with a
the interpreter extracts the underlying
Py_buffer from the
memoryview returned by the method
and returns it to the C caller. Similarly, if Python code calls the
__buffer__ method on an instance of a C class that
bf_getbuffer, the returned buffer is wrapped in a
memoryview for consumption by Python code.
__release_buffer__ method should be called when a caller no
longer needs the buffer returned by
__buffer__. It corresponds to the
bf_releasebuffer C slot. This is an
optional part of the buffer protocol.
The Python signature for this method is
def __release_buffer__(self, buffer: memoryview, /) -> None: ....
The buffer to be released is wrapped in a
memoryview. When this
method is invoked through CPython’s buffer API (for example, through
memoryview.release on a
memoryview returned by
__buffer__), the passed
memoryview is the same object
as was returned by
__buffer__. It is
also possible to call
__release_buffer__ on a C class that
__release_buffer__ exists on an object,
Python code that calls
__buffer__ directly on the object must
__release_buffer__ on the same object when it is done
with the buffer. Otherwise, resources used by the object may
not be reclaimed. Similarly, it is a programming error
__release_buffer__ without a previous call to
__buffer__, or to call it multiple times for a single call
__buffer__. For objects that implement the C buffer protocol,
__release_buffer__ where the argument is not a
memoryview wrapping the same object will raise an exception.
After a valid call to
is invalidated (as if its
release() method had been called),
and any subsequent calls to
__release_buffer__ with the same
memoryview will raise an exception.
The interpreter will ensure that misuse
of the Python API will not break invariants at the C level – for
example, it will not cause memory safety violations.
To help implementations of
__buffer__, we add
a subclass of
enum.IntFlag. This enum contains all flags defined in the
C buffer protocol. For example,
inspect.BufferFlags.SIMPLE has the same
value as the
We add a new abstract base classes,
which requires the
This class is intended primarily for use in type annotations:
def need_buffer(b: Buffer) -> memoryview:
need_buffer(b"xy") # ok
need_buffer("xy") # rejected by static type checkers
It can also be used in
>>> from collections.abc import Buffer
>>> isinstance(b"xy", Buffer)
>>> issubclass(bytes, Buffer)
>>> issubclass(memoryview, Buffer)
>>> isinstance("xy", Buffer)
>>> issubclass(str, Buffer)
In the typeshed stub files, the class should be defined as a
following the precedent of other simple ABCs in
collections.abc such as
The following is an example of a Python class that implements the buffer protocol:
def __init__(self, data: bytes):
self.data = bytearray(data)
self.view = None
def __buffer__(self, flags: int) -> memoryview:
if flags != inspect.BufferFlags.FULL_RO:
raise TypeError("Only BufferFlags.FULL_RO supported")
if self.view is not None:
raise RuntimeError("Buffer already held")
self.view = memoryview(self.data)
def __release_buffer__(self, view: memoryview) -> None:
assert self.view is view # guaranteed to be true
self.view = None
def extend(self, b: bytes) -> None:
if self.view is not None:
raise RuntimeError("Cannot extend held buffer")
buffer = MyBuffer(b"capybara")
with memoryview(buffer) as view:
view = ord("C")
buffer.extend(b"!") # raises RuntimeError
buffer.extend(b"!") # ok, buffer is no longer held
with memoryview(buffer) as view:
assert view.tobytes() == b"Capybara!"
Equivalent for older Python versions
New typing features are usually backported to older Python versions
in the typing_extensions
package. Because the buffer protocol
is currently accessible only in C, this PEP cannot be fully implemented
in a pure-Python package like
typing_extensions. As a temporary
workaround, an abstract base class
will be provided for Python versions
that do not have
After this PEP is implemented, inheriting from
not be necessary to indicate that an object supports the buffer protocol.
However, in older Python versions, it will be necessary to explicitly
typing_extensions.Buffer to indicate to type checkers that
a class supports the buffer protocol, since objects supporting the buffer
protocol will not have a
__buffer__ method. It is expected that this
will happen primarily in stub files, because buffer classes are necessarily
implemented in C code, which cannot have types defined inline.
For runtime uses, the
ABC.register API can be used to register
buffer classes with
No special meaning for
The special case stating that
bytes may be used as a shorthand
ByteString types will be removed from the
collections.abc.Buffer available as an alternative, there will be no good
reason to allow
bytes as a shorthand.
Type checkers currently implementing this behavior
should deprecate and eventually remove it.
As the runtime changes in this PEP only add new functionality, there are few backwards compatibility concerns.
However, code that uses a
__release_buffer__ attribute for
other purposes may be affected. While all dunders are technically reserved for the
language, it is still good practice to ensure that a new dunder does not
interfere with too much existing code, especially widely used packages. A survey
of publicly accessible code found:
- PyPy supports
__buffer__method with compatible semantics to those proposed in this PEP. A PyPy core developer expressed his support for this PEP.
- pyzmq implements
- mpi4py defines
SupportsBufferprotocol that would be equivalent to this PEP’s
- NumPy used to have an undocumented behavior where it would access a
__buffer__attribute (not method) to get an object’s buffer. This was removed in 2019 for NumPy 1.17. The behavior would have last worked in NumPy 1.16, which only supported Python 3.7 and older. Python 3.7 will have reached its end of life by the time this PEP is expected to be implemented.
Thus, this PEP’s use of the
__buffer__ method will improve interoperability with
PyPy and not interfere with the current versions of any major Python packages.
No publicly accessible code uses the name
Removal of the
bytes special case
Separately, the recommendation to remove the special behavior for
bytes in type checkers does have a backwards compatibility
impact on their users. An experiment
with mypy shows that several major open source projects that use it
for type checking will see new errors if the
is removed. Many of these errors can be fixed by improving
the stubs in typeshed, as has already been done for the
A review of all
bytes types in typeshed is in progress.
Overall, the change improves type safety and makes the type system
more consistent, so we believe the migration cost is worth it.
How to Teach This
We will add notes pointing to
collections.abc.Buffer in appropriate places in the
documentation, such as typing.readthedocs.io
and the mypy cheat sheet.
Type checkers may provide additional pointers in their error messages. For example,
when they encounter a buffer object being passed to a function that
is annotated to only accept
bytes, the error message could include a note suggesting
the use of
An implementation of this PEP is available in the author’s fork.
An earlier version of this PEP proposed adding a new
types.Buffer type with
__instancecheck__ implemented in C so that
isinstance() checks can be
used to check whether a type implements the buffer protocol. This avoids the
complexity of exposing the full buffer protocol to Python code, while still
allowing the type system to check for the buffer protocol.
However, that approach
does not compose well with the rest of the type system, because
would be a nominal type, not a structural one. For example, there would be no way
to represent “an object that supports both the buffer protocol and
the current proposal,
__buffer__ is like any other special method, so a
Protocol can be defined combining it with another method.
More generally, no other part of Python works like the proposed
The current proposal is more consistent with the rest of the language, where
C-level slots usually have corresponding Python-level special methods.
bytearray compatible with
It has been suggested to remove the special case where
always compatible with
bytes, but keep it for
the two types have very similar interfaces. However, several standard
library functions (e.g.,
socket.getaddrinfo, and most
functions accepting path-like arguments) accept
bytes but not
bytearray. In most codebases,
bytearray is also
not a very common type. We prefer to have users spell out accepted types
explicitly (or use
Protocol from PEP 544 if only a specific set of
methods is required). This aspect of the proposal was specifically
on the typing-sig mailing list, without any strong disagreement from the
Distinguish between mutable and immutable buffers
The most frequently used distinction within buffer types is
whether or not the buffer is mutable. Some functions accept only
mutable buffers (e.g.,
others accept all buffers.
An earlier version of this PEP proposed using the presence of the
bf_releasebuffer slot to determine whether a buffer type is mutable.
This rule holds for most standard library buffer types, but the relationship
between mutability and the presence of this slot is not absolute. For
numpy arrays are mutable but do not have this slot.
The current buffer protocol does not provide any way to reliably determine whether a buffer type represents a mutable or immutable buffer. Therefore, this PEP does not add type system support for this distinction. The question can be revisited in the future if the buffer protocol is enhanced to provide static introspection support. A sketch for such a mechanism exists.
Many people have provided useful feedback on drafts of this PEP. Petr Viktorin has been particularly helpful in improving my understanding of the subtleties of the buffer protocol.
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Last modified: 2024-02-16 16:11:53 GMT