PEP: 3118 Title: Revising the buffer protocol Version: $Revision$
Last-Modified: $Date$ Author: Travis Oliphant <oliphant@ee.byu.edu>,
Carl Banks <pythondev@aerojockey.com> Status: Final Type: Standards
Track Content-Type: text/x-rst Created: 28-Aug-2006 Python-Version: 3.0
Post-History:

Abstract

This PEP proposes re-designing the buffer interface (PyBufferProcs
function pointers) to improve the way Python allows memory sharing in
Python 3.0

In particular, it is proposed that the character buffer portion of the
API be eliminated and the multiple-segment portion be re-designed in
conjunction with allowing for strided memory to be shared. In addition,
the new buffer interface will allow the sharing of any multi-dimensional
nature of the memory and what data-format the memory contains.

This interface will allow any extension module to either create objects
that share memory or create algorithms that use and manipulate raw
memory from arbitrary objects that export the interface.

Rationale

The Python 2.X buffer protocol allows different Python types to exchange
a pointer to a sequence of internal buffers. This functionality is
extremely useful for sharing large segments of memory between different
high-level objects, but it is too limited and has issues:

1.  There is the little used "sequence-of-segments" option
    (bf_getsegcount) that is not well motivated.

2.  There is the apparently redundant character-buffer option
    (bf_getcharbuffer)

3.  There is no way for a consumer to tell the buffer-API-exporting
    object it is "finished" with its view of the memory and therefore no
    way for the exporting object to be sure that it is safe to
    reallocate the pointer to the memory that it owns (for example, the
    array object reallocating its memory after sharing it with the
    buffer object which held the original pointer led to the infamous
    buffer-object problem).

4.  Memory is just a pointer with a length. There is no way to describe
    what is "in" the memory (float, int, C-structure, etc.)

5.  There is no shape information provided for the memory. But, several
    array-like Python types could make use of a standard way to describe
    the shape-interpretation of the memory (wxPython, GTK, pyQT, CVXOPT,
    PyVox, Audio and Video Libraries, ctypes, NumPy, data-base
    interfaces, etc.)

6.  There is no way to share discontiguous memory (except through the
    sequence of segments notion).

    There are two widely used libraries that use the concept of
    discontiguous memory: PIL and NumPy. Their view of discontiguous
    arrays is different, though. The proposed buffer interface allows
    sharing of either memory model. Exporters will typically use only
    one approach and consumers may choose to support discontiguous
    arrays of each type however they choose.

    NumPy uses the notion of constant striding in each dimension as its
    basic concept of an array. With this concept, a simple sub-region of
    a larger array can be described without copying the data. Thus,
    stride information is the additional information that must be
    shared.

    The PIL uses a more opaque memory representation. Sometimes an image
    is contained in a contiguous segment of memory, but sometimes it is
    contained in an array of pointers to the contiguous segments
    (usually lines) of the image. The PIL is where the idea of multiple
    buffer segments in the original buffer interface came from.

    NumPy's strided memory model is used more often in computational
    libraries and because it is so simple it makes sense to support
    memory sharing using this model. The PIL memory model is sometimes
    used in C-code where a 2-d array can then be accessed using double
    pointer indirection: e.g. image[i][j].

    The buffer interface should allow the object to export either of
    these memory models. Consumers are free to either require contiguous
    memory or write code to handle one or both of these memory models.

Proposal Overview

-   Eliminate the char-buffer and multiple-segment sections of the
    buffer-protocol.
-   Unify the read/write versions of getting the buffer.
-   Add a new function to the interface that should be called when the
    consumer object is "done" with the memory area.
-   Add a new variable to allow the interface to describe what is in
    memory (unifying what is currently done now in struct and array)
-   Add a new variable to allow the protocol to share shape information
-   Add a new variable for sharing stride information
-   Add a new mechanism for sharing arrays that must be accessed using
    pointer indirection.
-   Fix all objects in the core and the standard library to conform to
    the new interface
-   Extend the struct module to handle more format specifiers
-   Extend the buffer object into a new memory object which places a
    Python veneer around the buffer interface.
-   Add a few functions to make it easy to copy contiguous data in and
    out of object supporting the buffer interface.

Specification

While the new specification allows for complicated memory sharing,
simple contiguous buffers of bytes can still be obtained from an object.
In fact, the new protocol allows a standard mechanism for doing this
even if the original object is not represented as a contiguous chunk of
memory.

The easiest way to obtain a simple contiguous chunk of memory is to use
the provided C-API to obtain a chunk of memory.

Change the PyBufferProcs structure to :

    typedef struct {
         getbufferproc bf_getbuffer;
         releasebufferproc bf_releasebuffer;
    } PyBufferProcs;

Both of these routines are optional for a type object

    typedef int (*getbufferproc)(PyObject *obj, PyBuffer *view, int flags)

This function returns 0 on success and -1 on failure (and raises an
error). The first variable is the "exporting" object. The second
argument is the address to a bufferinfo structure. Both arguments must
never be NULL.

The third argument indicates what kind of buffer the consumer is
prepared to deal with and therefore what kind of buffer the exporter is
allowed to return. The new buffer interface allows for much more
complicated memory sharing possibilities. Some consumers may not be able
to handle all the complexity but may want to see if the exporter will
let them take a simpler view to its memory.

In addition, some exporters may not be able to share memory in every
possible way and may need to raise errors to signal to some consumers
that something is just not possible. These errors should be
PyErr_BufferError unless there is another error that is actually causing
the problem. The exporter can use flags information to simplify how much
of the PyBuffer structure is filled in with non-default values and/or
raise an error if the object can't support a simpler view of its memory.

The exporter should always fill in all elements of the buffer structure
(with defaults or NULLs if nothing else is requested). The
PyBuffer_FillInfo function can be used for simple cases.

Access flags

Some flags are useful for requesting a specific kind of memory segment,
while others indicate to the exporter what kind of information the
consumer can deal with. If certain information is not asked for by the
consumer, but the exporter cannot share its memory without that
information, then a PyErr_BufferError should be raised.

PyBUF_SIMPLE

  This is the default flag state (0). The returned buffer may or may not
  have writable memory. The format will be assumed to be unsigned bytes.
  This is a "stand-alone" flag constant. It never needs to be |'d to the
  others. The exporter will raise an error if it cannot provide such a
  contiguous buffer of bytes.

PyBUF_WRITABLE

  The returned buffer must be writable. If it is not writable, then
  raise an error.

PyBUF_FORMAT

  The returned buffer must have true format information if this flag is
  provided. This would be used when the consumer is going to be checking
  for what 'kind' of data is actually stored. An exporter should always
  be able to provide this information if requested. If format is not
  explicitly requested then the format must be returned as NULL (which
  means "B", or unsigned bytes)

PyBUF_ND

  The returned buffer must provide shape information. The memory will be
  assumed C-style contiguous (last dimension varies the fastest). The
  exporter may raise an error if it cannot provide this kind of
  contiguous buffer. If this is not given then shape will be NULL.

PyBUF_STRIDES (implies PyBUF_ND)

  The returned buffer must provide strides information (i.e. the strides
  cannot be NULL). This would be used when the consumer can handle
  strided, discontiguous arrays. Handling strides automatically assumes
  you can handle shape. The exporter may raise an error if cannot
  provide a strided-only representation of the data (i.e. without the
  suboffsets).

PyBUF_C_CONTIGUOUS
PyBUF_F_CONTIGUOUS
PyBUF_ANY_CONTIGUOUS

  These flags indicate that the returned buffer must be respectively,
  C-contiguous (last dimension varies the fastest), Fortran contiguous
  (first dimension varies the fastest) or either one. All of these flags
  imply PyBUF_STRIDES and guarantee that the strides buffer info
  structure will be filled in correctly.

PyBUF_INDIRECT (implies PyBUF_STRIDES)

  The returned buffer must have suboffsets information (which can be
  NULL if no suboffsets are needed). This would be used when the
  consumer can handle indirect array referencing implied by these
  suboffsets.

Specialized combinations of flags for specific kinds of memory_sharing.

  Multi-dimensional (but contiguous)

    PyBUF_CONTIG (PyBUF_ND | PyBUF_WRITABLE)
    PyBUF_CONTIG_RO (PyBUF_ND)

  Multi-dimensional using strides but aligned

    PyBUF_STRIDED (PyBUF_STRIDES | PyBUF_WRITABLE)
    PyBUF_STRIDED_RO (PyBUF_STRIDES)

  Multi-dimensional using strides and not necessarily aligned

    PyBUF_RECORDS (PyBUF_STRIDES | PyBUF_WRITABLE | PyBUF_FORMAT)
    PyBUF_RECORDS_RO (PyBUF_STRIDES | PyBUF_FORMAT)

  Multi-dimensional using sub-offsets

    PyBUF_FULL (PyBUF_INDIRECT | PyBUF_WRITABLE | PyBUF_FORMAT)
    PyBUF_FULL_RO (PyBUF_INDIRECT | PyBUF_FORMAT)

Thus, the consumer simply wanting a contiguous chunk of bytes from the
object would use PyBUF_SIMPLE, while a consumer that understands how to
make use of the most complicated cases could use PyBUF_FULL.

The format information is only guaranteed to be non-NULL if PyBUF_FORMAT
is in the flag argument, otherwise it is expected the consumer will
assume unsigned bytes.

There is a C-API that simple exporting objects can use to fill-in the
buffer info structure correctly according to the provided flags if a
contiguous chunk of "unsigned bytes" is all that can be exported.

The Py_buffer struct

The bufferinfo structure is:

    struct bufferinfo {
         void *buf;
         Py_ssize_t len;
         int readonly;
         const char *format;
         int ndim;
         Py_ssize_t *shape;
         Py_ssize_t *strides;
         Py_ssize_t *suboffsets;
         Py_ssize_t itemsize;
         void *internal;
    } Py_buffer;

Before calling the bf_getbuffer function, the bufferinfo structure can
be filled with whatever, but the buf field must be NULL when requesting
a new buffer. Upon return from bf_getbuffer, the bufferinfo structure is
filled in with relevant information about the buffer. This same
bufferinfo structure must be passed to bf_releasebuffer (if available)
when the consumer is done with the memory. The caller is responsible for
keeping a reference to obj until releasebuffer is called (i.e. the call
to bf_getbuffer does not alter the reference count of obj).

The members of the bufferinfo structure are:

buf

    a pointer to the start of the memory for the object

len

    the total bytes of memory the object uses. This should be the same
    as the product of the shape array multiplied by the number of bytes
    per item of memory.

readonly

    an integer variable to hold whether or not the memory is readonly. 1
    means the memory is readonly, zero means the memory is writable.

format

    a NULL-terminated format-string (following the struct-style syntax
    including extensions) indicating what is in each element of memory.
    The number of elements is len / itemsize, where itemsize is the
    number of bytes implied by the format. This can be NULL which
    implies standard unsigned bytes ("B").

ndim

    a variable storing the number of dimensions the memory represents.
    Must be >=0. A value of 0 means that shape and strides and
    suboffsets must be NULL (i.e. the memory represents a scalar).

shape

    an array of Py_ssize_t of length ndims indicating the shape of the
    memory as an N-D array. Note that
    ((*shape)[0] * ... * (*shape)[ndims-1])*itemsize = len. If ndims is
    0 (indicating a scalar), then this must be NULL.

strides

    address of a Py_ssize_t* variable that will be filled with a pointer
    to an array of Py_ssize_t of length ndims (or NULL if ndims is 0).
    indicating the number of bytes to skip to get to the next element in
    each dimension. If this is not requested by the caller
    (PyBUF_STRIDES is not set), then this should be set to NULL which
    indicates a C-style contiguous array or a PyExc_BufferError raised
    if this is not possible.

suboffsets

    address of a Py_ssize_t * variable that will be filled with a
    pointer to an array of Py_ssize_t of length *ndims. If these
    suboffset numbers are >=0, then the value stored along the indicated
    dimension is a pointer and the suboffset value dictates how many
    bytes to add to the pointer after de-referencing. A suboffset value
    that it negative indicates that no de-referencing should occur
    (striding in a contiguous memory block). If all suboffsets are
    negative (i.e. no de-referencing is needed, then this must be NULL
    (the default value). If this is not requested by the caller
    (PyBUF_INDIRECT is not set), then this should be set to NULL or an
    PyExc_BufferError raised if this is not possible.

    For clarity, here is a function that returns a pointer to the
    element in an N-D array pointed to by an N-dimensional index when
    there are both non-NULL strides and suboffsets:

        void *get_item_pointer(int ndim, void *buf, Py_ssize_t *strides,
                               Py_ssize_t *suboffsets, Py_ssize_t *indices) {
            char *pointer = (char*)buf;
            int i;
            for (i = 0; i < ndim; i++) {
                pointer += strides[i] * indices[i];
                if (suboffsets[i] >=0 ) {
                    pointer = *((char**)pointer) + suboffsets[i];
                }
            }
            return (void*)pointer;
        }

    Notice the suboffset is added "after" the dereferencing occurs. Thus
    slicing in the ith dimension would add to the suboffsets in the
    (i-1)st dimension. Slicing in the first dimension would change the
    location of the starting pointer directly (i.e. buf would be
    modified).

itemsize

    This is a storage for the itemsize (in bytes) of each element of the
    shared memory. It is technically un-necessary as it can be obtained
    using PyBuffer_SizeFromFormat, however an exporter may know this
    information without parsing the format string and it is necessary to
    know the itemsize for proper interpretation of striding. Therefore,
    storing it is more convenient and faster.

internal

    This is for use internally by the exporting object. For example,
    this might be re-cast as an integer by the exporter and used to
    store flags about whether or not the shape, strides, and suboffsets
    arrays must be freed when the buffer is released. The consumer
    should never alter this value.

The exporter is responsible for making sure that any memory pointed to
by buf, format, shape, strides, and suboffsets is valid until
releasebuffer is called. If the exporter wants to be able to change an
object's shape, strides, and/or suboffsets before releasebuffer is
called then it should allocate those arrays when getbuffer is called
(pointing to them in the buffer-info structure provided) and free them
when releasebuffer is called.

Releasing the buffer

The same bufferinfo struct should be used in the release-buffer
interface call. The caller is responsible for the memory of the
Py_buffer structure itself.

    typedef void (*releasebufferproc)(PyObject *obj, Py_buffer *view)

Callers of getbufferproc must make sure that this function is called
when memory previously acquired from the object is no longer needed. The
exporter of the interface must make sure that any memory pointed to in
the bufferinfo structure remains valid until releasebuffer is called.

If the bf_releasebuffer function is not provided (i.e. it is NULL), then
it does not ever need to be called.

Exporters will need to define a bf_releasebuffer function if they can
re-allocate their memory, strides, shape, suboffsets, or format
variables which they might share through the struct bufferinfo. Several
mechanisms could be used to keep track of how many getbuffer calls have
been made and shared. Either a single variable could be used to keep
track of how many "views" have been exported, or a linked-list of
bufferinfo structures filled in could be maintained in each object.

All that is specifically required by the exporter, however, is to ensure
that any memory shared through the bufferinfo structure remains valid
until releasebuffer is called on the bufferinfo structure exporting that
memory.

New C-API calls are proposed

    int PyObject_CheckBuffer(PyObject *obj)

Return 1 if the getbuffer function is available otherwise 0.

    int PyObject_GetBuffer(PyObject *obj, Py_buffer *view,
                           int flags)

This is a C-API version of the getbuffer function call. It checks to
make sure object has the required function pointer and issues the call.
Returns -1 and raises an error on failure and returns 0 on success.

    void PyBuffer_Release(PyObject *obj, Py_buffer *view)

This is a C-API version of the releasebuffer function call. It checks to
make sure the object has the required function pointer and issues the
call. This function always succeeds even if there is no releasebuffer
function for the object.

    PyObject *PyObject_GetMemoryView(PyObject *obj)

Return a memory-view object from an object that defines the buffer
interface.

A memory-view object is an extended buffer object that could replace the
buffer object (but doesn't have to as that could be kept as a simple 1-d
memory-view object). Its C-structure is :

    typedef struct {
        PyObject_HEAD
        PyObject *base;
        Py_buffer view;
    } PyMemoryViewObject;

This is functionally similar to the current buffer object except a
reference to base is kept and the memory view is not re-grabbed. Thus,
this memory view object holds on to the memory of base until it is
deleted.

This memory-view object will support multi-dimensional slicing and be
the first object provided with Python to do so. Slices of the
memory-view object are other memory-view objects with the same base but
with a different view of the base object.

When an "element" from the memory-view is returned it is always a bytes
object whose format should be interpreted by the format attribute of the
memoryview object. The struct module can be used to "decode" the bytes
in Python if desired. Or the contents can be passed to a NumPy array or
other object consuming the buffer protocol.

The Python name will be

__builtin__.memoryview

Methods:

 __getitem__ (will support multi-dimensional slicing)
 __setitem__ (will support multi-dimensional slicing)
 tobytes (obtain a new bytes-object of a copy of the memory).
 tolist (obtain a "nested" list of the memory. Everything is interpreted
into standard Python objects as the struct module unpack would do -- in
fact it uses struct.unpack to accomplish it).

Attributes (taken from the memory of the base object):

-   format
-   itemsize
-   shape
-   strides
-   suboffsets
-   readonly
-   ndim

    Py_ssize_t PyBuffer_SizeFromFormat(const char *)

Return the implied itemsize of the data-format area from a struct-style
description.

    PyObject * PyMemoryView_GetContiguous(PyObject *obj,  int buffertype,
                                          char fortran)

Return a memoryview object to a contiguous chunk of memory represented
by obj. If a copy must be made (because the memory pointed to by obj is
not contiguous), then a new bytes object will be created and become the
base object for the returned memory view object.

The buffertype argument can be PyBUF_READ, PyBUF_WRITE,
PyBUF_UPDATEIFCOPY to determine whether the returned buffer should be
readable, writable, or set to update the original buffer if a copy must
be made. If buffertype is PyBUF_WRITE and the buffer is not contiguous
an error will be raised. In this circumstance, the user can use
PyBUF_UPDATEIFCOPY to ensure that a writable temporary contiguous buffer
is returned. The contents of this contiguous buffer will be copied back
into the original object after the memoryview object is deleted as long
as the original object is writable. If this is not allowed by the
original object, then a BufferError is raised.

If the object is multi-dimensional, then if fortran is 'F', the first
dimension of the underlying array will vary the fastest in the buffer.
If fortran is 'C', then the last dimension will vary the fastest
(C-style contiguous). If fortran is 'A', then it does not matter and you
will get whatever the object decides is more efficient. If a copy is
made, then the memory must be freed by calling PyMem_Free.

You receive a new reference to the memoryview object.

    int PyObject_CopyToObject(PyObject *obj, void *buf, Py_ssize_t len,
                              char fortran)

Copy len bytes of data pointed to by the contiguous chunk of memory
pointed to by buf into the buffer exported by obj. Return 0 on success
and return -1 and raise an error on failure. If the object does not have
a writable buffer, then an error is raised. If fortran is 'F', then if
the object is multi-dimensional, then the data will be copied into the
array in Fortran-style (first dimension varies the fastest). If fortran
is 'C', then the data will be copied into the array in C-style (last
dimension varies the fastest). If fortran is 'A', then it does not
matter and the copy will be made in whatever way is more efficient.

    int PyObject_CopyData(PyObject *dest, PyObject *src)

These last three C-API calls allow a standard way of getting data in and
out of Python objects into contiguous memory areas no matter how it is
actually stored. These calls use the extended buffer interface to
perform their work.

    int PyBuffer_IsContiguous(Py_buffer *view, char fortran)

Return 1 if the memory defined by the view object is C-style (fortran =
'C') or Fortran-style (fortran = 'F') contiguous or either one (fortran
= 'A'). Return 0 otherwise.

    void PyBuffer_FillContiguousStrides(int ndim, Py_ssize_t *shape,
                                        Py_ssize_t *strides, Py_ssize_t itemsize,
                                        char fortran)

Fill the strides array with byte-strides of a contiguous (C-style if
fortran is 'C' or Fortran-style if fortran is 'F' array of the given
shape with the given number of bytes per element.

    int PyBuffer_FillInfo(Py_buffer *view, void *buf,
                          Py_ssize_t len, int readonly, int infoflags)

Fills in a buffer-info structure correctly for an exporter that can only
share a contiguous chunk of memory of "unsigned bytes" of the given
length. Returns 0 on success and -1 (with raising an error) on error.

    PyExc_BufferError

A new error object for returning buffer errors which arise because an
exporter cannot provide the kind of buffer that a consumer expects. This
will also be raised when a consumer requests a buffer from an object
that does not provide the protocol.

Additions to the struct string-syntax

The struct string-syntax is missing some characters to fully implement
data-format descriptions already available elsewhere (in ctypes and
NumPy for example). The Python 2.5 specification is at
http://docs.python.org/library/struct.html.

Here are the proposed additions:

+------------------+--------------------------------------------------+
| Character        | Description                                      |
+==================+==================================================+
| 't'              | bit (number before states how many bits)         |
+------------------+--------------------------------------------------+
| '?'              | platform _Bool type                              |
+------------------+--------------------------------------------------+
| 'g'              | long double                                      |
+------------------+--------------------------------------------------+
| 'c'              | ucs-1 (latin-1) encoding                         |
+------------------+--------------------------------------------------+
| 'u'              | ucs-2                                            |
+------------------+--------------------------------------------------+
| 'w'              | ucs-4                                            |
+------------------+--------------------------------------------------+
| 'O'              | pointer to Python Object                         |
+------------------+--------------------------------------------------+
| 'Z'              | complex (whatever the next specifier is)         |
+------------------+--------------------------------------------------+
| '&'              | specific pointer (prefix before another          |
|                  | character)                                       |
+------------------+--------------------------------------------------+
| 'T{}'            | structure (detailed layout inside {})            |
+------------------+--------------------------------------------------+
| '(k1,k2,...,kn)' | multi-dimensional array of whatever follows      |
+------------------+--------------------------------------------------+
| ':name:'         | optional name of the preceding element           |
+------------------+--------------------------------------------------+
| 'X{}'            | pointer to a function (optional function         |
|                  |                                                  |
|                  |     signature inside {} with any return value    |
|                  |     preceded by -> and placed at the end)        |
+------------------+--------------------------------------------------+

The struct module will be changed to understand these as well and return
appropriate Python objects on unpacking. Unpacking a long-double will
return a decimal object or a ctypes long-double. Unpacking 'u' or 'w'
will return Python unicode. Unpacking a multi-dimensional array will
return a list (of lists if >1d). Unpacking a pointer will return a
ctypes pointer object. Unpacking a function pointer will return a ctypes
call-object (perhaps). Unpacking a bit will return a Python Bool.
White-space in the struct-string syntax will be ignored if it isn't
already. Unpacking a named-object will return some kind of
named-tuple-like object that acts like a tuple but whose entries can
also be accessed by name. Unpacking a nested structure will return a
nested tuple.

Endian-specification ('!', '@','=','>','<', '^') is also allowed inside
the string so that it can change if needed. The previously-specified
endian string is in force until changed. The default endian is '@' which
means native data-types and alignment. If un-aligned, native data-types
are requested, then the endian specification is '^'.

According to the struct-module, a number can precede a character code to
specify how many of that type there are. The (k1,k2,...,kn) extension
also allows specifying if the data is supposed to be viewed as a
(C-style contiguous, last-dimension varies the fastest)
multi-dimensional array of a particular format.

Functions should be added to ctypes to create a ctypes object from a
struct description, and add long-double, and ucs-2 to ctypes.

Examples of Data-Format Descriptions

Here are some examples of C-structures and how they would be represented
using the struct-style syntax.

<named> is the constructor for a named-tuple (not-specified yet).

float

    'd' <--> Python float

complex double

    'Zd' <--> Python complex

RGB Pixel data

    'BBB' <--> (int, int, int) 'B:r: B:g: B:b:' <--> <named>((int, int,
    int), ('r','g','b'))

Mixed endian (weird but possible)

    '>i:big: <i:little:' <--> <named>((int, int), ('big', 'little'))

Nested structure

    struct {
             int ival;
             struct {
                 unsigned short sval;
                 unsigned char bval;
                 unsigned char cval;
             } sub;
        }
        """i:ival:
           T{
              H:sval:
              B:bval:
              B:cval:
            }:sub:
        """

Nested array

    struct {
             int ival;
             double data[16*4];
        }
        """i:ival:
           (16,4)d:data:
        """

Note, that in the last example, the C-structure compared against is
intentionally a 1-d array and not a 2-d array data[16][4]. The reason
for this is to avoid the confusions between static multi-dimensional
arrays in C (which are laid out contiguously) and dynamic
multi-dimensional arrays which use the same syntax to access elements,
data[0][1], but whose memory is not necessarily contiguous. The
struct-syntax always uses contiguous memory and the multi-dimensional
character is information about the memory to be communicated by the
exporter.

In other words, the struct-syntax description does not have to match the
C-syntax exactly as long as it describes the same memory layout. The
fact that a C-compiler would think of the memory as a 1-d array of
doubles is irrelevant to the fact that the exporter wanted to
communicate to the consumer that this field of the memory should be
thought of as a 2-d array where a new dimension is considered after
every 4 elements.

Code to be affected

All objects and modules in Python that export or consume the old buffer
interface will be modified. Here is a partial list.

-   buffer object
-   bytes object
-   string object
-   unicode object
-   array module
-   struct module
-   mmap module
-   ctypes module

Anything else using the buffer API.

Issues and Details

It is intended that this PEP will be back-ported to Python 2.6 by adding
the C-API and the two functions to the existing buffer protocol.

Previous versions of this PEP proposed a read/write locking scheme, but
it was later perceived as a) too complicated for common simple use cases
that do not require any locking and b) too simple for use cases that
required concurrent read/write access to a buffer with changing,
short-living locks. It is therefore left to users to implement their own
specific locking scheme around buffer objects if they require consistent
views across concurrent read/write access. A future PEP may be proposed
which includes a separate locking API after some experience with these
user-schemes is obtained

The sharing of strided memory and suboffsets is new and can be seen as a
modification of the multiple-segment interface. It is motivated by NumPy
and the PIL. NumPy objects should be able to share their strided memory
with code that understands how to manage strided memory because strided
memory is very common when interfacing with compute libraries.

Also, with this approach it should be possible to write generic code
that works with both kinds of memory without copying.

Memory management of the format string, the shape array, the strides
array, and the suboffsets array in the bufferinfo structure is always
the responsibility of the exporting object. The consumer should not set
these pointers to any other memory or try to free them.

Several ideas were discussed and rejected:

  Having a "releaser" object whose release-buffer was called. This was
  deemed unacceptable because it caused the protocol to be asymmetric
  (you called release on something different than you "got" the buffer
  from). It also complicated the protocol without providing a real
  benefit.

  Passing all the struct variables separately into the function. This
  had the advantage that it allowed one to set NULL to variables that
  were not of interest, but it also made the function call more
  difficult. The flags variable allows the same ability of consumers to
  be "simple" in how they call the protocol.

Code

The authors of the PEP promise to contribute and maintain the code for
this proposal but will welcome any help.

Examples

Ex. 1

This example shows how an image object that uses contiguous lines might
expose its buffer:

    struct rgba {
        unsigned char r, g, b, a;
    };

    struct ImageObject {
        PyObject_HEAD;
        ...
        struct rgba** lines;
        Py_ssize_t height;
        Py_ssize_t width;
        Py_ssize_t shape_array[2];
        Py_ssize_t stride_array[2];
        Py_ssize_t view_count;
    };

"lines" points to malloced 1-D array of (struct rgba*). Each pointer in
THAT block points to a separately malloced array of (struct rgba).

In order to access, say, the red value of the pixel at x=30, y=50, you'd
use "lines[50][30].r".

So what does ImageObject's getbuffer do? Leaving error checking out:

    int Image_getbuffer(PyObject *self, Py_buffer *view, int flags) {

        static Py_ssize_t suboffsets[2] = { 0, -1};

        view->buf = self->lines;
        view->len = self->height*self->width;
        view->readonly = 0;
        view->ndims = 2;
        self->shape_array[0] = height;
        self->shape_array[1] = width;
        view->shape = &self->shape_array;
        self->stride_array[0] = sizeof(struct rgba*);
        self->stride_array[1] = sizeof(struct rgba);
        view->strides = &self->stride_array;
        view->suboffsets = suboffsets;

        self->view_count ++;

        return 0;
    }


    int Image_releasebuffer(PyObject *self, Py_buffer *view) {
        self->view_count--;
        return 0;
    }

Ex. 2

This example shows how an object that wants to expose a contiguous chunk
of memory (which will never be re-allocated while the object is alive)
would do that.

    int myobject_getbuffer(PyObject *self, Py_buffer *view, int flags) {

        void *buf;
        Py_ssize_t len;
        int readonly=0;

        buf = /* Point to buffer */
        len = /* Set to size of buffer */
        readonly = /* Set to 1 if readonly */

        return PyObject_FillBufferInfo(view, buf, len, readonly, flags);
    }

    /* No releasebuffer is necessary because the memory will never
       be re-allocated
    */

Ex. 3

A consumer that wants to only get a simple contiguous chunk of bytes
from a Python object, obj would do the following:

    Py_buffer view;
    int ret;

    if (PyObject_GetBuffer(obj, &view, Py_BUF_SIMPLE) < 0) {
         /* error return */
    }

    /* Now, view.buf is the pointer to memory
            view.len is the length
            view.readonly is whether or not the memory is read-only.
     */


    /* After using the information and you don't need it anymore */

    if (PyBuffer_Release(obj, &view) < 0) {
            /* error return */
    }

Ex. 4

A consumer that wants to be able to use any object's memory but is
writing an algorithm that only handle contiguous memory could do the
following:

    void *buf;
    Py_ssize_t len;
    char *format;
    int copy;

    copy = PyObject_GetContiguous(obj, &buf, &len, &format, 0, 'A');
    if (copy < 0) {
       /* error return */
    }

    /* process memory pointed to by buffer if format is correct */

    /* Optional:

       if, after processing, we want to copy data from buffer back
       into the object

       we could do
       */

    if (PyObject_CopyToObject(obj, buf, len, 'A') < 0) {
           /*        error return */
    }

    /* Make sure that if a copy was made, the memory is freed */
    if (copy == 1) PyMem_Free(buf);

Copyright

This PEP is placed in the public domain.