PEP: 460 Title: Add binary interpolation and formatting Version:
$Revision$ Last-Modified: $Date$ Author: Antoine Pitrou
<solipsis@pitrou.net> Status: Withdrawn Type: Standards Track
Content-Type: text/x-rst Created: 06-Jan-2014 Python-Version: 3.5

Abstract

This PEP proposes to add minimal formatting operations to bytes and
bytearray objects. The proposed additions are:

-   bytes % ... and bytearray % ... for percent-formatting, similar in
    syntax to percent-formatting on str objects (accepting a single
    object, a tuple or a dict).
-   bytes.format(...) and bytearray.format(...) for a formatting similar
    in syntax to str.format() (accepting positional as well as keyword
    arguments).
-   bytes.format_map(...) and bytearray.format_map(...) for an API
    similar to str.format_map(...), with the same formatting syntax and
    semantics as bytes.format() and bytearray.format().

Rationale

In Python 2, str % args and str.format(args) allow the formatting and
interpolation of bytestrings. This feature has commonly been used for
the assembling of protocol messages when protocols are known to use a
fixed encoding.

Python 3 generally mandates that text be stored and manipulated as
unicode (i.e. str objects, not bytes). In some cases, though, it makes
sense to manipulate bytes objects directly. Typical usage is binary
network protocols, where you can want to interpolate and assemble
several bytes object (some of them literals, some of them compute) to
produce complete protocol messages. For example, protocols such as HTTP
or SIP have headers with ASCII names and opaque "textual" values using a
varying and/or sometimes ill-defined encoding. Moreover, those headers
can be followed by a binary body... which can be chunked and decorated
with ASCII headers and trailers!

While there are reasonably efficient ways to accumulate binary data
(such as using a bytearray object, the bytes.join method or even
io.BytesIO), none of them leads to the kind of readable and intuitive
code that is produced by a %-formatted or {}-formatted template and a
formatting operation.

Binary formatting features

Supported features

In this proposal, percent-formatting for bytes and bytearray supports
the following features:

-   Looking up formatting arguments by position as well as by name
    (i.e., %s as well as %(name)s).
-   %s will try to get a Py_buffer on the given value, and fallback on
    calling __bytes__. The resulting binary data is inserted at the
    given point in the string. This is expected to work with bytes,
    bytearray and memoryview objects (as well as a couple others such as
    pathlib's path objects).
-   %c will accept an integer between 0 and 255, and insert a byte of
    the given value.

Braces-formatting for bytes and bytearray supports the following
features:

-   All the kinds of argument lookup supported by str.format() (explicit
    positional lookup, auto-incremented positional lookup, keyword
    lookup, attribute lookup, etc.)
-   Insertion of binary data when no modifier or layout is specified
    (e.g. {}, {0}, {name}). This has the same semantics as %s for
    percent-formatting (see above).
-   The c modifier will accept an integer between 0 and 255, and insert
    a byte of the given value (same as %c above).

Unsupported features

All other features present in formatting of str objects (either through
the percent operator or the str.format() method) are unsupported. Those
features imply treating the recipient of the operator or method as text,
which goes counter to the text / bytes separation (for example,
accepting %d as a format code would imply that the bytes object really
is an ASCII-compatible text string).

Amongst those unsupported features are not only most type-specific
format codes, but also the various layout specifiers such as padding or
alignment. Besides, str objects are not acceptable as arguments to the
formatting operations, even when using e.g. the %s format code.

__format__ isn't called.

Criticisms

-   The development cost and maintenance cost.
-   In 3.3 encoding to ASCII or latin-1 is as fast as memcpy (but it
    still creates a separate object).
-   Developers will have to work around the lack of binary formatting
    anyway, if they want to support Python 3.4 and earlier.
-   bytes.join() is consistently faster than format to join bytes
    strings (XXX is it?).
-   Formatting functions could be implemented in a third party module,
    rather than added to builtin types.

Other proposals

A new type datatype

It was proposed to create a new datatype specialized for "network
programming". The authors of this PEP believe this is
counter-productive. Python 3 already has several major types dedicated
to manipulation of binary data: bytes, bytearray, memoryview,
io.BytesIO.

Adding yet another type would make things more confusing for users, and
interoperability between libraries more painful (also potentially
sub-optimal, due to the necessary conversions).

Moreover, not one type would be needed, but two: one immutable type (to
allow for hashing), and one mutable type (as efficient accumulation is
often necessary when working with network messages).

Resolution

This PEP is made obsolete by the acceptance of PEP 461, which introduces
a more extended formatting language for bytes objects in conjunction
with the modulo operator.

References

-   Issue #3982: support .format for bytes
-   Mercurial project
-   Twisted project
-   Documentation of Python 2 formatting (str % args)
-   Documentation of Python 2 formatting (str.format)

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End: