Following system colour scheme Selected dark colour scheme Selected light colour scheme

Python Enhancement Proposals

PEP 782 – Add PyBytesWriter C API

Author:
Victor Stinner <vstinner at python.org>
Discussions-To:
Discourse thread
Status:
Draft
Type:
Standards Track
Created:
27-Mar-2025
Python-Version:
3.14
Post-History:
18-Feb-2025

Table of Contents

Abstract

Add a new PyBytesWriter C API to create bytes objects.

Soft deprecate PyBytes_FromStringAndSize(NULL, size) and _PyBytes_Resize() APIs. These APIs treat an immutable bytes object as a mutable object. They remain available and maintained, don’t emit deprecation warning, but are no longer recommended when writing new code.

Rationale

Disallow creation of incomplete/inconsistent objects

Creating a Python bytes object using PyBytes_FromStringAndSize(NULL, size) and _PyBytes_Resize() treats an immutable bytes object as mutable. It goes against the principle that bytes objects are immutable. It also creates an incomplete or “invalid” object since bytes are not initialized. In Python, a bytes object should always have its bytes fully initialized.

Inefficient allocation strategy

When creating a bytes string and the output size is unknown, one strategy is to allocate a short buffer and extend it (to the exact size) each time a larger write is needed.

This strategy is inefficient because it requires to enlarge the buffer multiple timess. It’s more efficient to overallocate the buffer the first time that a larger write is needed. It reduces the number of expensive realloc() operations which can imply a memory copy.

Specification

API

type PyBytesWriter
A Python bytes writer instance created by PyBytesWriter_Create().

The instance must be destroyed by PyBytesWriter_Finish() or PyBytesWriter_Discard().

Create, Finish, Discard

PyBytesWriter *PyBytesWriter_Create(Py_ssize_t size)
Create a PyBytesWriter to write size bytes.

If size is greater than zero, allocate size bytes, and set the writer size to size. The caller is responsible to write size bytes using PyBytesWriter_GetData().

On error, set an exception and return NULL.

size must be positive or zero.

PyObject *PyBytesWriter_Finish(PyBytesWriter *writer)
Finish a PyBytesWriter created by PyBytesWriter_Create().

On success, return a Python bytes object. On error, set an exception and return NULL.

The writer instance is invalid after the call in any case.

PyObject *PyBytesWriter_FinishWithSize(PyBytesWriter *writer, Py_ssize_t size)
Similar to PyBytesWriter_Finish(), but resize the writer to size bytes before creating the bytes object.
PyObject *PyBytesWriter_FinishWithPointer(PyBytesWriter *writer, void *buf)
Similar to PyBytesWriter_Finish(), but resize the writer using buf pointer before creating the bytes object.

Set an exception and return NULL if buf pointer is outside the internal buffer bounds.

Function pseudo-code:

Py_ssize_t size = (char*)buf - (char*)PyBytesWriter_GetData(writer);
return PyBytesWriter_FinishWithSize(writer, size);
void PyBytesWriter_Discard(PyBytesWriter *writer)
Discard a PyBytesWriter created by PyBytesWriter_Create().

Do nothing if writer is NULL.

The writer instance is invalid after the call.

High-level API

int PyBytesWriter_WriteBytes(PyBytesWriter *writer, const void *bytes, Py_ssize_t size)
Grow the writer internal buffer by size bytes, write size bytes of bytes at the writer end, and add size to the writer size.

If size is equal to -1, call strlen(bytes) to get the string length.

On success, return 0. On error, set an exception and return -1.

int PyBytesWriter_Format(PyBytesWriter *writer, const char *format, ...)
Similar to PyBytes_FromFormat(), but write the output directly at the writer end. Grow the writer internal buffer on demand. Then add the written size to the writer size.

On success, return 0. On error, set an exception and return -1.

Getters

Py_ssize_t PyBytesWriter_GetSize(PyBytesWriter *writer)
Get the writer size.
void *PyBytesWriter_GetData(PyBytesWriter *writer)
Get the writer data: start of the internal buffer.

The pointer is valid until PyBytesWriter_Finish() or PyBytesWriter_Discard() is called on writer.

Low-level API

int PyBytesWriter_Resize(PyBytesWriter *writer, Py_ssize_t size)
Resize the writer to size bytes. It can be used to enlarge or to shrink the writer.

Newly allocated bytes are left uninitialized.

On success, return 0. On error, set an exception and return -1.

size must be positive or zero.

int PyBytesWriter_Grow(PyBytesWriter *writer, Py_ssize_t grow)
Resize the writer by adding grow bytes to the current writer size.

Newly allocated bytes are left uninitialized.

On success, return 0. On error, set an exception and return -1.

size can be negative to shrink the writer.

void *PyBytesWriter_GrowAndUpdatePointer(PyBytesWriter *writer, Py_ssize_t size, void *buf)
Similar to PyBytesWriter_Grow(), but update also the buf pointer.

The buf pointer is moved if the internal buffer is moved in memory. The buf relative position within the internal buffer is left unchanged.

On error, set an exception and return NULL.

buf must not be NULL.

Function pseudo-code:

Py_ssize_t pos = (char*)buf - (char*)PyBytesWriter_GetData(writer);
if (PyBytesWriter_Grow(writer, size) < 0) {
    return NULL;
}
return (char*)PyBytesWriter_GetData(writer) + pos;

Overallocation

PyBytesWriter_Resize() and PyBytesWriter_Grow() overallocate the internal buffer to reduce the number of realloc() calls and so reduce memory copies.

PyBytesWriter_Finish() trims overallocations: it shrinks the internal buffer to the exact size when creating the final bytes object.

Thread safety

The API is not thread safe: a writer should only be used by a single thread at the same time.

Soft deprecations

Soft deprecate PyBytes_FromStringAndSize(NULL, size) and _PyBytes_Resize() APIs. These APIs treat an immutable bytes object as a mutable object. They remain available and maintained, don’t emit deprecation warning, but are no longer recommended when writing new code.

PyBytes_FromStringAndSize(str, size) is not soft deprecated. Only calls with NULL str are soft deprecated.

Examples

High-level API

Create the bytes string b"Hello World!":

PyObject* hello_world(void)
{
    PyBytesWriter *writer = PyBytesWriter_Create(0);
    if (writer == NULL) {
        goto error;
    }
    if (PyBytesWriter_WriteBytes(writer, "Hello", -1) < 0) {
        goto error;
    }
    if (PyBytesWriter_Format(writer, " %s!", "World") < 0) {
        goto error;
    }
    return PyBytesWriter_Finish(writer);

error:
    PyBytesWriter_Discard(writer);
    return NULL;
}

Create the bytes string “abc”

Example creating the bytes string b"abc", with a fixed size of 3 bytes:

PyObject* create_abc(void)
{
    PyBytesWriter *writer = PyBytesWriter_Create(3);
    if (writer == NULL) {
        return NULL;
    }

    char *str = PyBytesWriter_GetData(writer);
    memcpy(str, "abc", 3);
    return PyBytesWriter_Finish(writer);
}

GrowAndUpdatePointer() example

Example using a pointer to write bytes and to track the written size.

Create the bytes string b"Hello World":

PyObject* grow_example(void)
{
    // Allocate 10 bytes
    PyBytesWriter *writer = PyBytesWriter_Create(10);
    if (writer == NULL) {
        return NULL;
    }

    // Write some bytes
    char *buf = PyBytesWriter_GetData(writer);
    memcpy(buf, "Hello ", strlen("Hello "));
    buf += strlen("Hello ");

    // Allocate 10 more bytes
    buf = PyBytesWriter_GrowAndUpdatePointer(writer, 10, buf);
    if (buf == NULL) {
        PyBytesWriter_Discard(writer);
        return NULL;
    }

    // Write more bytes
    memcpy(buf, "World", strlen("World"));
    buf += strlen("World");

    // Truncate the string at 'buf' position
    // and create a bytes object
    return PyBytesWriter_FinishWithPointer(writer, buf);
}

Update PyBytes_FromStringAndSize() code

Example of code using the soft deprecated PyBytes_FromStringAndSize(NULL, size) API:

PyObject *result = PyBytes_FromStringAndSize(NULL, num_bytes);
if (result == NULL) {
    return NULL;
}
if (copy_bytes(PyBytes_AS_STRING(result), start, num_bytes) < 0) {
    Py_CLEAR(result);
}
return result;

It can now be updated to:

PyBytesWriter *writer = PyBytesWriter_Create(num_bytes);
if (writer == NULL) {
    return NULL;
}
if (copy_bytes(PyBytesWriter_GetData(writer), start, num_bytes) < 0) {
    PyBytesWriter_Discard(writer);
    return NULL;
}
return PyBytesWriter_Finish(writer);

Update _PyBytes_Resize() code

Example of code using the soft deprecated _PyBytes_Resize() API:

PyObject *v = PyBytes_FromStringAndSize(NULL, size);
if (v == NULL) {
    return NULL;
}
char *p = PyBytes_AS_STRING(v);

// ... fill bytes into 'p' ...

if (_PyBytes_Resize(&v, (p - PyBytes_AS_STRING(v)))) {
    return NULL;
}
return v;

It can now be updated to:

PyBytesWriter *writer = PyBytesWriter_Create(size);
if (writer == NULL) {
    return NULL;
}
char *p = PyBytesWriter_GetData(writer);

// ... fill bytes into 'p' ...

return PyBytesWriter_FinishWithPointer(writer, p);

Reference Implementation

Pull request gh-131681.

Notes on the CPython reference implementation which are not part of the Specification:

  • The implementation allocates internally a bytes object, so PyBytesWriter_Finish() just returns the object without having to copy memory.
  • For strings up to 256 bytes, a small internal raw buffer of bytes is used. It avoids having to resize a bytes object which is inefficient. At the end, PyBytesWriter_Finish() creates the bytes object from this small buffer.
  • A free list is used to reduce the cost of allocating a PyBytesWriter on the heap memory.

Backwards Compatibility

There is no impact on the backward compatibility, only new APIs are added.

PyBytes_FromStringAndSize(NULL, size) and _PyBytes_Resize() APIs are soft deprecated. No new warnings is emitted when these functions are used and they are not planned for removal.

Prior Discussions


Source: https://github.com/python/peps/blob/main/peps/pep-0782.rst

Last modified: 2025-03-31 18:34:24 GMT