PEP 782 – Add PyBytesWriter C API
- Author:
- Victor Stinner <vstinner at python.org>
- Discussions-To:
- Discourse thread
- Status:
- Draft
- Type:
- Standards Track
- Created:
- 27-Mar-2025
- Python-Version:
- 3.14
- Post-History:
- 18-Feb-2025
Abstract
Add a new PyBytesWriter
C API to create bytes
objects.
Soft deprecate PyBytes_FromStringAndSize(NULL, size)
and
_PyBytes_Resize()
APIs. These APIs treat an immutable bytes
object as a mutable object. They remain available and maintained, don’t
emit deprecation warning, but are no longer recommended when writing new
code.
Rationale
Disallow creation of incomplete/inconsistent objects
Creating a Python bytes
object using
PyBytes_FromStringAndSize(NULL, size)
and _PyBytes_Resize()
treats an immutable bytes
object as mutable. It goes against
the principle that bytes
objects are immutable. It also creates
an incomplete or “invalid” object since bytes are not initialized. In
Python, a bytes
object should always have its bytes fully
initialized.
Inefficient allocation strategy
When creating a bytes string and the output size is unknown, one strategy is to allocate a short buffer and extend it (to the exact size) each time a larger write is needed.
This strategy is inefficient because it requires to enlarge the buffer
multiple timess. It’s more efficient to overallocate the buffer the
first time that a larger write is needed. It reduces the number of
expensive realloc()
operations which can imply a memory copy.
Specification
API
-
type PyBytesWriter
- A Python
bytes
writer instance created byPyBytesWriter_Create()
.The instance must be destroyed by
PyBytesWriter_Finish()
orPyBytesWriter_Discard()
.
Create, Finish, Discard
-
PyBytesWriter *PyBytesWriter_Create(Py_ssize_t size)
- Create a
PyBytesWriter
to write size bytes.If size is greater than zero, allocate size bytes, and set the writer size to size. The caller is responsible to write size bytes using
PyBytesWriter_GetData()
.On error, set an exception and return NULL.
size must be positive or zero.
-
PyObject *PyBytesWriter_Finish(PyBytesWriter *writer)
- Finish a
PyBytesWriter
created byPyBytesWriter_Create()
.On success, return a Python
bytes
object. On error, set an exception and returnNULL
.The writer instance is invalid after the call in any case.
-
PyObject *PyBytesWriter_FinishWithSize(PyBytesWriter *writer, Py_ssize_t size)
- Similar to
PyBytesWriter_Finish()
, but resize the writer to size bytes before creating thebytes
object.
-
PyObject *PyBytesWriter_FinishWithPointer(PyBytesWriter *writer, void *buf)
- Similar to
PyBytesWriter_Finish()
, but resize the writer using buf pointer before creating thebytes
object.Set an exception and return
NULL
if buf pointer is outside the internal buffer bounds.Function pseudo-code:
Py_ssize_t size = (char*)buf - (char*)PyBytesWriter_GetData(writer); return PyBytesWriter_FinishWithSize(writer, size);
-
void PyBytesWriter_Discard(PyBytesWriter *writer)
- Discard a
PyBytesWriter
created byPyBytesWriter_Create()
.Do nothing if writer is
NULL
.The writer instance is invalid after the call.
High-level API
-
int PyBytesWriter_WriteBytes(PyBytesWriter *writer, const void *bytes, Py_ssize_t size)
- Grow the writer internal buffer by size bytes,
write size bytes of bytes at the writer end,
and add size to the writer size.
If size is equal to
-1
, callstrlen(bytes)
to get the string length.On success, return
0
. On error, set an exception and return-1
.
-
int PyBytesWriter_Format(PyBytesWriter *writer, const char *format, ...)
- Similar to
PyBytes_FromFormat()
, but write the output directly at the writer end. Grow the writer internal buffer on demand. Then add the written size to the writer size.On success, return
0
. On error, set an exception and return-1
.
Getters
-
Py_ssize_t PyBytesWriter_GetSize(PyBytesWriter *writer)
- Get the writer size.
-
void *PyBytesWriter_GetData(PyBytesWriter *writer)
- Get the writer data: start of the internal buffer.
The pointer is valid until
PyBytesWriter_Finish()
orPyBytesWriter_Discard()
is called on writer.
Low-level API
-
int PyBytesWriter_Resize(PyBytesWriter *writer, Py_ssize_t size)
- Resize the writer to size bytes. It can be used to enlarge or to
shrink the writer.
Newly allocated bytes are left uninitialized.
On success, return
0
. On error, set an exception and return-1
.size must be positive or zero.
-
int PyBytesWriter_Grow(PyBytesWriter *writer, Py_ssize_t grow)
- Resize the writer by adding grow bytes to the current writer size.
Newly allocated bytes are left uninitialized.
On success, return
0
. On error, set an exception and return-1
.size can be negative to shrink the writer.
-
void *PyBytesWriter_GrowAndUpdatePointer(PyBytesWriter *writer, Py_ssize_t size, void *buf)
- Similar to
PyBytesWriter_Grow()
, but update also the buf pointer.The buf pointer is moved if the internal buffer is moved in memory. The buf relative position within the internal buffer is left unchanged.
On error, set an exception and return
NULL
.buf must not be
NULL
.Function pseudo-code:
Py_ssize_t pos = (char*)buf - (char*)PyBytesWriter_GetData(writer); if (PyBytesWriter_Grow(writer, size) < 0) { return NULL; } return (char*)PyBytesWriter_GetData(writer) + pos;
Overallocation
PyBytesWriter_Resize()
and PyBytesWriter_Grow()
overallocate the internal buffer to reduce the number of realloc()
calls and so reduce memory copies.
PyBytesWriter_Finish()
trims overallocations: it shrinks the
internal buffer to the exact size when creating the final bytes
object.
Thread safety
The API is not thread safe: a writer should only be used by a single thread at the same time.
Soft deprecations
Soft deprecate PyBytes_FromStringAndSize(NULL, size)
and
_PyBytes_Resize()
APIs. These APIs treat an immutable bytes
object as a mutable object. They remain available and maintained, don’t
emit deprecation warning, but are no longer recommended when writing new
code.
PyBytes_FromStringAndSize(str, size)
is not soft deprecated. Only
calls with NULL
str are soft deprecated.
Examples
High-level API
Create the bytes string b"Hello World!"
:
PyObject* hello_world(void)
{
PyBytesWriter *writer = PyBytesWriter_Create(0);
if (writer == NULL) {
goto error;
}
if (PyBytesWriter_WriteBytes(writer, "Hello", -1) < 0) {
goto error;
}
if (PyBytesWriter_Format(writer, " %s!", "World") < 0) {
goto error;
}
return PyBytesWriter_Finish(writer);
error:
PyBytesWriter_Discard(writer);
return NULL;
}
Create the bytes string “abc”
Example creating the bytes string b"abc"
, with a fixed size of 3 bytes:
PyObject* create_abc(void)
{
PyBytesWriter *writer = PyBytesWriter_Create(3);
if (writer == NULL) {
return NULL;
}
char *str = PyBytesWriter_GetData(writer);
memcpy(str, "abc", 3);
return PyBytesWriter_Finish(writer);
}
GrowAndUpdatePointer()
example
Example using a pointer to write bytes and to track the written size.
Create the bytes string b"Hello World"
:
PyObject* grow_example(void)
{
// Allocate 10 bytes
PyBytesWriter *writer = PyBytesWriter_Create(10);
if (writer == NULL) {
return NULL;
}
// Write some bytes
char *buf = PyBytesWriter_GetData(writer);
memcpy(buf, "Hello ", strlen("Hello "));
buf += strlen("Hello ");
// Allocate 10 more bytes
buf = PyBytesWriter_GrowAndUpdatePointer(writer, 10, buf);
if (buf == NULL) {
PyBytesWriter_Discard(writer);
return NULL;
}
// Write more bytes
memcpy(buf, "World", strlen("World"));
buf += strlen("World");
// Truncate the string at 'buf' position
// and create a bytes object
return PyBytesWriter_FinishWithPointer(writer, buf);
}
Update PyBytes_FromStringAndSize()
code
Example of code using the soft deprecated
PyBytes_FromStringAndSize(NULL, size)
API:
PyObject *result = PyBytes_FromStringAndSize(NULL, num_bytes);
if (result == NULL) {
return NULL;
}
if (copy_bytes(PyBytes_AS_STRING(result), start, num_bytes) < 0) {
Py_CLEAR(result);
}
return result;
It can now be updated to:
PyBytesWriter *writer = PyBytesWriter_Create(num_bytes);
if (writer == NULL) {
return NULL;
}
if (copy_bytes(PyBytesWriter_GetData(writer), start, num_bytes) < 0) {
PyBytesWriter_Discard(writer);
return NULL;
}
return PyBytesWriter_Finish(writer);
Update _PyBytes_Resize()
code
Example of code using the soft deprecated _PyBytes_Resize()
API:
PyObject *v = PyBytes_FromStringAndSize(NULL, size);
if (v == NULL) {
return NULL;
}
char *p = PyBytes_AS_STRING(v);
// ... fill bytes into 'p' ...
if (_PyBytes_Resize(&v, (p - PyBytes_AS_STRING(v)))) {
return NULL;
}
return v;
It can now be updated to:
PyBytesWriter *writer = PyBytesWriter_Create(size);
if (writer == NULL) {
return NULL;
}
char *p = PyBytesWriter_GetData(writer);
// ... fill bytes into 'p' ...
return PyBytesWriter_FinishWithPointer(writer, p);
Reference Implementation
Notes on the CPython reference implementation which are not part of the Specification:
- The implementation allocates internally a
bytes
object, soPyBytesWriter_Finish()
just returns the object without having to copy memory. - For strings up to 256 bytes, a small internal raw buffer of bytes is
used. It avoids having to resize a
bytes
object which is inefficient. At the end,PyBytesWriter_Finish()
creates thebytes
object from this small buffer. - A free list is used to reduce the cost of allocating a
PyBytesWriter
on the heap memory.
Backwards Compatibility
There is no impact on the backward compatibility, only new APIs are added.
PyBytes_FromStringAndSize(NULL, size)
and _PyBytes_Resize()
APIs
are soft deprecated. No new warnings is emitted when these functions are
used and they are not planned for removal.
Prior Discussions
- March 2025: Third public API attempt, using size rather than pointers:
- February 2025: Second public API attempt:
- July 2024: First public API attempt:
- C API Working Group decision: Add PyBytes_Writer() API (August 2024)
- Pull request gh-121726: first public API attempt (July 2024)
- March 2016:
Fast _PyAccu, _PyUnicodeWriter and _PyBytesWriter APIs to produce
strings in CPython:
Article on the original private
_PyBytesWriter
C API.
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Source: https://github.com/python/peps/blob/main/peps/pep-0782.rst
Last modified: 2025-03-31 18:34:24 GMT