PEP: 467 Title: Minor API improvements for binary sequences Version:
$Revision$ Last-Modified: $Date$ Author: Alyssa Coghlan
<ncoghlan@gmail.com>, Ethan Furman <ethan@stoneleaf.us> Discussions-To:
https://discuss.python.org/t/42001 Status: Draft Type: Standards Track
Content-Type: text/x-rst Created: 30-Mar-2014 Python-Version: 3.13
Post-History: 30-Mar-2014, 15-Aug-2014, 16-Aug-2014, 07-Jun-2016,
01-Sep-2016, 13-Apr-2021, 03-Nov-2021, 27-Dec-2023

Abstract

This PEP proposes small adjustments to the APIs of the bytes and
bytearray types to make it easier to operate entirely in the binary
domain:

-   Add fromsize alternative constructor
-   Add fromint alternative constructor
-   Add getbyte byte retrieval method
-   Add iterbytes alternative iterator

Rationale

During the initial development of the Python 3 language specification,
the core bytes type for arbitrary binary data started as the mutable
type that is now referred to as bytearray. Other aspects of operating in
the binary domain in Python have also evolved over the course of the
Python 3 series, for example with PEP 461.

Motivation

With Python 3 and the split between str and bytes, one small but
important area of programming became slightly more difficult, and much
more painful -- wire format protocols.

This area of programming is characterized by a mixture of binary data
and ASCII compatible segments of text (aka ASCII-encoded text). The
addition of the new constructors, methods, and iterators will aid both
in writing new wire format code, and in porting any remaining Python 2
wire format code.

Common use-cases include dbf and pdf file formats, email formats, and
FTP and HTTP communications, among many others.

Proposals

Addition of explicit "count and byte initialised sequence" constructors

To replace the discouraged behavior of creating zero-filled bytes-like
objects from the basic constructors (i.e. bytes(1) --> b'\x00'), this
PEP proposes the addition of an explicit fromsize alternative
constructor as a class method on both bytes and bytearray whose first
argument is the count, and whose second argument is the fill byte to use
(defaults to \x00):

    >>> bytes.fromsize(3)
    b'\x00\x00\x00'
    >>> bytearray.fromsize(3)
    bytearray(b'\x00\x00\x00')
    >>> bytes.fromsize(5, b'\x0a')
    b'\x0a\x0a\x0a\x0a\x0a'
    >>> bytearray.fromsize(5, fill=b'\x0a')
    bytearray(b'\x0a\x0a\x0a\x0a\x0a')

fromsize will behave just as the current constructors behave when passed
a single integer, while allowing for non-zero fill values when needed.

Addition of explicit "single byte" constructors

As binary counterparts to the text chr function, this PEP proposes the
addition of an explicit fromint alternative constructor as a class
method on both bytes and bytearray:

    >>> bytes.fromint(65)
    b'A'
    >>> bytearray.fromint(65)
    bytearray(b'A')

These methods will only accept integers in the range 0 to 255
(inclusive):

    >>> bytes.fromint(512)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: integer must be in range(0, 256)

    >>> bytes.fromint(1.0)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'float' object cannot be interpreted as an integer

The documentation of the ord builtin will be updated to explicitly note
that bytes.fromint is the primary inverse operation for binary data,
while chr is the inverse operation for text data, and that
bytearray.fromint also exists.

Behaviorally, bytes.fromint(x) will be equivalent to the current
bytes([x]) (and similarly for bytearray). The new spelling is expected
to be easier to discover and easier to read (especially when used in
conjunction with indexing operations on binary sequence types).

As a separate method, the new spelling will also work better with higher
order functions like map.

These new methods intentionally do NOT offer the same level of general
integer support as the existing int.to_bytes conversion method, which
allows arbitrarily large integers to be converted to arbitrarily long
bytes objects. The restriction to only accept positive integers that fit
in a single byte means that no byte order information is needed, and
there is no need to handle negative numbers. The documentation of the
new methods will refer readers to int.to_bytes for use cases where
handling of arbitrary integers is needed.

Addition of "getbyte" method to retrieve a single byte

This PEP proposes that bytes and bytearray gain the method getbyte which
will always return bytes:

    >>> b'abc'.getbyte(0)
    b'a'

If an index is asked for that doesn't exist, IndexError is raised:

    >>> b'abc'.getbyte(9)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    IndexError: index out of range

Addition of optimised iterator methods that produce bytes objects

This PEP proposes that bytes and bytearray gain an optimised iterbytes
method that produces length 1 bytes objects rather than integers:

    for x in data.iterbytes():
        # x is a length 1 ``bytes`` object, rather than an integer

For example:

    >>> tuple(b"ABC".iterbytes())
    (b'A', b'B', b'C')

Design discussion

Why not rely on sequence repetition to create zero-initialised sequences?

Zero-initialised sequences can be created via sequence repetition:

    >>> b'\x00' * 3
    b'\x00\x00\x00'
    >>> bytearray(b'\x00') * 3
    bytearray(b'\x00\x00\x00')

However, this was also the case when the bytearray type was originally
designed, and the decision was made to add explicit support for it in
the type constructor. The immutable bytes type then inherited that
feature when it was introduced in PEP 3137.

This PEP isn't revisiting that original design decision, just changing
the spelling as users sometimes find the current behavior of the binary
sequence constructors surprising. In particular, there's a reasonable
case to be made that bytes(x) (where x is an integer) should behave like
the bytes.fromint(x) proposal in this PEP. Providing both behaviors as
separate class methods avoids that ambiguity.

Current Workarounds

After nearly a decade, there's seems to be no consensus on the best
workarounds for byte iteration, as demonstrated by Get single-byte bytes
objects from bytes objects.

Omitting the originally proposed builtin function

When submitted to the Steering Council, this PEP proposed the
introduction of a bchr builtin (with the same behaviour as
bytes.fromint), recreating the ord/chr/unichr trio from Python 2 under a
different naming scheme (ord/bchr/chr).

The SC indicated they didn't think this functionality was needed often
enough to justify offering two ways of doing the same thing, especially
when one of those ways was a new builtin function. That part of the
proposal was therefore dropped as being redundant with the bytes.fromint
alternate constructor.

Developers that use this method frequently will instead have the option
to define their own bchr = bytes.fromint aliases.

Scope limitation: memoryview

Updating memoryview with the new item retrieval methods is outside the
scope of this PEP.

References

-   Initial March 2014 discussion thread on python-ideas
-   Guido's initial feedback in that thread
-   Issue proposing moving zero-initialised sequences to a dedicated API
-   Issue proposing to use calloc() for zero-initialised binary
    sequences
-   August 2014 discussion thread on python-dev
-   June 2016 discussion thread on python-dev
-   Get single-byte bytes objects from bytes objects

Copyright

This document has been placed in the public domain.