PEP 786 – Precision and modulo-precision flag format specifiers for integer fields
- Author:
- Jay Berry <email at jb2170.com>
- Sponsor:
- Alyssa Coghlan <ncoghlan at gmail.com>
- Discussions-To:
- Discourse thread
- Status:
- Draft
- Type:
- Standards Track
- Created:
- 04-Apr-2025
- Python-Version:
- 3.15
- Post-History:
- 14-Feb-2025, 09-Apr-2026
Abstract
This PEP proposes implementing the standard format specifiers . and z
of PEP 3101 for integer fields as “precision” and “modulo-precision”
respectively. Both are presented together in this PEP as the alternative
rejected implementations entail intertwined combinations of both.
. (“precision”) shall format an integer to a specified minimum number of
digits, identical to the behavior of old-style % formatting. This shall be
implemented for all integer presentation types except 'c'.
z (“modulo-precision”) shall be permitted as an optional “modulo” flag
when formatting an integer with precision and one of the binary, octal, or
hexadecimal presentation types (bases that are powers of two). This first
reduces the integer into range(base ** precision) using the % operator.
The result is a predictable two’s complement style formatting with the exact
number of digits equal to the precision.
This PEP amends the clause of PEP 3101 which states “The precision is ignored for integer conversions”.
Rationale
When string formatting integers in binary octal and hexadecimal, one often
desires the resulting string to contain a guaranteed minimum number of digits.
For unsigned integers of known machine-width bounds (for example, 8-bit bytes)
this often also ends up the exact resulting number of digits. This has
previously been implemented in the old-style % formatting using the
. “precision” format specifier, closely related to that of the C
programming language.
>>> "0x%.2x" % 15
'0x0f' # two hex digits, ideal for displaying an unsigned byte
>>> "0o%.3o" % 18
'0o022' # three octal digits, ideal for displaying a umask or file permissions
When PEP 3101 new-style formatting was first introduced, used in
str.format and f-strings, the format specification was
simple enough that the behavior of “precision” could be trivially emulated with
the width format specifier. Precision therefore was left unimplemented and
forbidden for int fields. However, as time has progressed and new format
specifiers have been added, whose interactions with width noticeably
diverge its behavior away from emulating precision, the readmission of
precision as its own format specifier, ., is sufficiently warranted.
The width format specifier guarantees a minimum length of the entire
replacement field, not just the number of digits in a formatted integer.
For example, the wonderful # specifier that prepends the prefix of the
corresponding presentation type consumes from width:
>>> x = 12
>>> f"0x{x:02x}" # manually specifying '0x' prefix
'0x0c' # two hex digits :)
>>> f"{x:#02x}" # use '#' format specifier to output '0x' automatically
'0xc' # only one hex digit :(
>>> f"{x:#08b}"
'0b001100' # we wanted 8 bits, not 6 :(
One could attempt to argue that since the length of a prefix is known to always be 2, it can be accounted for manually by adding 2 to the desired number of digits. Consider however the following demonstrations of why this is a bad idea:
- By correcting the second example to
f"{x:#04x}", at a glance this looks like it may produce four hex digits, but it only produces two. This is bad for readability.4is thus too much of a ‘magic number’, and trying to counter that by being overly explicit withf"{x:#0{2+2}x}"looks ridiculous. - In the future it is possible that a type specifier may be added with a prefix not of length 2, meaning the programmer has to calculate the prefix length, rather than Python’s internal string formatting code handling that automatically.
- Things get more complicated when using the
signformat specifier,f"{x: #0{1+2+2}x}"required to produce' 0x0c'. - Things get even more complicated when introducing a
grouping_option, for example formatting an integer intok‘word’ segments joined by_:x = 3735928559; k = 2; f"{x: #0{1+2+4*k+(k - 1)}_x}"is required to produce' 0xdead_beef'. Surely this would be easier to write with precision asf"{x: #_.8x}"?
It is clear at this point that the reduction of complexity that would be
provided by precision’s implementation for int fields would be beneficial
to any user. Nor is this proposal a new special-case behavior being demanded
exclusively at the behest of int fields: the precision token . is
already implemented as prescribed in PEP 3101 for str data to truncate
the field’s length, and for float data to ensure that there are a fixed
number of digits after the decimal point, eg f"{0.1+0.2: .4f}" producing
' 0.3000'. Thus no new tokens need adding to the format specification
because of this proposal, maintaining its modest size.
For the sake of completion, and lack of any reasonable objection, we propose
that precision shall work also in decimal, base 10. Explicitly, the integer
presentation types laid out in PEP 3101 that are permitted to implement
precision are 'b', 'd', 'o', 'x', 'X', 'n',
and '' (None). The only presentation type not permitted is
c (‘character’), whose purpose is to format an integer to a single Unicode
character, or an appropriate replacement for non-printable characters, for
which it does not make sense to implement precision. In the event that new
integer presentation types are added in the future, such as 'B' and 'O'
which mutatis-mutandis could provide the same behavior as 'X' (that is a
capitalized prefix and digits), their addition should appropriately consider
whether precision should be implemented or not. In the case of 'B' and 'O'
as described here it would be correct to implement precision. A ValueError
shall be raised when precision is attempted to be used for invalid integer
presentation types.
Precision For Negative Numbers
So far in this PEP we have cautiously avoided talking about the formatting of negative numbers with precision, which we shall now discuss.
Short Verdict
We desire two behaviors, which motivates the implementation of a flag z to
toggle on the latter’s behavior:
- For precision without the
zflag, a negative integerxshall be formatted with a negative sign and the digits of-x’s formatting. This is the same friendly behavior as old-style%formatting.For example
f"{-12:#.2x}"shall produce'-0x0c', equivalent to"%#.2x" % -12. - For precision with the
zflag,r = x % base ** nis first taken when formattingf"{x:z.{n}{base_char}}", andris passed on to precision, the resulting string being equivalent tof"{r:.{n}{base_char}}". Becauseris inrange(base ** n)the number of digits will always be exactlyn, resulting in a predictable two’s complement style formatting, which is useful to the end user in environments that deal with machine-width oriented integers such asstruct.For example in formatting
f"{-1:z#.2x}",-1is reduced modulo256via255 = -1 % 256, the resulting string being equivalent tof"{255:#.2x}", which is'0xff'.The
zflag shall only be implemented for presentation types corresponding to bases that are powers of two, specifically at present binary, octal, and hexadecimal. Whilst reduction of integers modulo by powers of ten is computationally possible, a ‘ten’s complement?’ has no demand and so precision is unimplemented for decimal presentation types. Thezflag shall work for all integers, not just negatives.The syntax choice of
zis again out of respect for maintaining the modest size of the format specification.zwas introduced to the format specification in PEP 682 as a flag for normalizing negative zero to positive zero for thefloatandDecimaltypes. It is currently unimplemented for theinttype, and since integers never have a ‘negative zero’ situation it seems uncontroversial to repurposez, again as a flag. If one squints hard enough, thezlooks like a2for two’s complement!
Long Introspection
We first present some observations about the binary representations of signed integers in two’s complement. This leads us to a couple of alternative formulations of formatting negative numbers.
Observe that one can always extend a signed number’s binary representation by extending the the leading digit as a prefix:
45 (8-bit) 00101101
45 (9-bit) 000101101
-19 (8-bit) 11101101
-19 (9-bit) 111101101
For non-negative numbers this is obvious. For negative numbers this is because
the erstwhile leading column of an n-bit representation goes from having a
value of -2 ** (n-1), to +2 ** (n-1), with a new n+1th column of
value -2 ** n prefixed on, the overall sum unaffected.
This is what C’s printf does, working with powers of two as the numbers of digits:
printf("%#hhb\n", -19); // 0b11101101
printf("%#hho\n", -19); // 0355
printf("%#hhx\n", -19); // 0xed
printf("%#b\n", -19); // 0b11111111111111111111111111101101
printf("%#o\n", -19); // 037777777755
printf("%#x\n", -19); // 0xffffffed
Conversely it should be clear that one can losslessly truncate a signed number’s
binary representation to have only one leading 0 if it is non-negative, and
one leading 1 if it is negative:
45 (8-bit) 00101101
45 (7-bit) 0101101
-19 (8-bit) 11101101
-19 (7-bit) 1101101
If one were to truncate another digit off of these examples, then both would
end up as 101101, 45 being indistinguishable from -19 when using only 6 binary
digits because they are both the same modulo 2 ** 6 = 64. Therefore to
losslessly and unambiguously represent a signed integer x as a binary string
which is rendered to the end user, we have a de facto ‘minimal width’ representation
convention, using n digits, where n is the smallest integer such that
x is in range(-2 ** (n-1), 2 ** (n-1)).
For rendering octal and hexadecimal strings one has to extend the definition of
the ‘minimal width’ representation convention to be sufficiently unambiguous.
383’s minimal width binary string is 0101111111, and -129’s is 101111111,
a suffix of the former’s. A naive, incorrect, implementation of hexadecimal
string formatting would render both as '0x17f' by padding both binary
representations to 000101111111. The method was correct to desire a number
of binary digits (12) that is divisible by the number of bits in the base
(4 bits in base 16) so that the binary representation can be segmented up into
(hex) digits, but it was incorrect in padding; the method should have instead
extended as we have observed previously, 383 extended to 000101111111,
and -129 extended to 111101111111, whence 383 is rendered as '0x17f'
and -129 as 0xf7f.
Thus the generalized definition of our ‘minimal width’ representation convention
is: for an integer x to rendered in base base, produce n digits,
where n is the smallest integer such that x is in
range(-base ** n / 2, base ** n / 2).
This leads onto the rejected alternatives.
Rejected Alternatives
Behavior of z
The desired implementation of z, the two’s complement style formatting flag,
has split into two main camps of opinions, disagreeing over lossless vs lossy
presentation. The lossless camp believes that the formatted strings corresponding
to integers should all be distinct from each other, uniqueness preserved by the
minimal width representation convention; precision with z enabled should still
be only a minimum number of digits requested, as it is without z. The lossy
camp believes that precision with z enabled should first reduce the integer
using modular arithmetic, which then produces exactly the number of digits
requested, equivalent to left-truncating the minimal width representation string.
We endeavor to conclude in the following section that the former camp, lossless formatting, has no use cases, and is thus a rejected idea, whence this PEP proposes the latter, lossy, behavior.
Minimal Width Representation Convention
This idea was fiercely entertained only due to its lossless behavior, however it is a obstacle to ergonomics in every candidate use case. These arguments about the aesthetics of string rendering are not irrational or about personal taste, but rather they are crucial in how information is communicated to the end user.
In a program in which signed-ness of integers is critical to communicate, any
implementation of z should not be used, as the average user will be expecting
to see a negative sign -. The alternative of using minimal width representation
convention requires one to be uncomfortably vigilant looking for leading digits
of numbers belonging to the upper half of the base’s range whenever a negative
number is present (1 for binary, 4-7 for octal, and 8-f for hex).
Any end user that is not aware of this de facto convention, and even those who
are but are not expecting it to be present in a program, would have a hard time:
The formatting of 128 and -128 using f"{x:z#.2x}" would produce '0x080'
and '0x80' respectively. It is the PEP author’s opinion that there is a 0%
chance that '0x80' is being read as negative 128 under normal conditions.
Furthermore the hideous rendering of positive 128 as '0x080' is useless for
a program that should produce a uniformly spaced hexdump of bytes, agnostic of
whether they are signed or unsigned; all bytes should be rendered in the form
'0xNN'. See the examples section on how modulo-precision
handles bytes in the correct sign-agnostic way.
Contrapositively therefore z’s purpose is to be used in environments where
signed-ness is not critical, and more likely than not where it is even
encouraged to treat the integers with respect to the modular arithmetic that
arises in two’s complement hardware of fixed register sizes. In the example above
128 and -128 are the same modulo 256, and the respectable rendering is '0x80'.
In general the purpose of z is to treat integers modulo base ** precision
as the same. So too 255 and -1 should both be rendered as '0xff', not
'0x0ff' and '0xff' respectively; the truncation is not a hindrance, but
the desired behavior. Formally we may say that the formatting should be a well
defined bijection between the equivalence classes of Z/(base ** precision)Z
and strings with precision digits.
The remaining question is “is there no chance to communicate this truncation to the user?” as a concern for the ‘loss of information’ arising from the effectively left-truncated strings. We reject this question’s premise that there ever is such a case of unintentional loss of information, by considering the two cases of hardware-aware integers and otherwise:
With respect to hardware-aware integers we have so far played around with examples
of integers in range(-128, 256), the union of the signed and unsigned ranges
for bytes. The virtues of formatting x and x - 256 as the same are clearly
established. In these contexts that one expects to find z, any erroneous integers
corresponding to bytes that lie outside that range are likely a programming error.
For example if a library sets a pixel brightness integer to be 257, and prints out
'0x01' instead of '0x101' via f"{x:z#.2x}", that’s not our problem or
doing; string formatting shouldn’t raise an exception, or even a SyntaxWarning
as an invalid escape sequence "\y" would, because ValueError: bytes must be in range(0, 256)
will be raised by bytes when trying to serialize that integer via bytes([257]);
let the appropriate ‘layer’ of code raise the exception, as that is more indicative
of a defect in the library, not our string formatting.
In the case of non-hardware aware integers, one would have to intentionally opt to
use z, in which modular arithmetic is the chosen desired effect. It is for
this reason also that we shall not raise a SyntaxWarning or ValueError
for integers lying outside of range(-base ** precision / 2, base ** precision).
Thus we have defended the lossy behavior of z implemented as modulo-precision,
and we have exhausted all reasonable use cases of lossless behavior.
A final compromise to consider and reject is implementing z not as a flag
contingent on ., but as a flag that can be combined with ..
Specifically: z without . would turn on two’s complement mode to render
the minimal width representation of the formatted integer, . without z
would implement precision as already explained, a minimum number of digits in the
magnitude and a sign if necessary, and z combined with . would turn on the
left-truncating modulo-precision. This labyrinth of combinations does not seem
useful to anyone, as we have already discredited the ergonomics of minimal width
representation convention, whence z would rarely be used on its own, and this
behavior of two options that individually render a minimum number of digits
combining together to render an exact number of digits seems counterintuitive.
Infinite Length Indication
Another, less popular, rejected alternative was for z to directly acknowledge
the infinite prefix of 0s or 1s that precede a non-negative or negative
number respectively. For example:
>>> f"{-1:z#.8b}"
'0b[...1]11111111'
>>> f"{300:z#.8b}"
'0b[...0]100101100'
This is effectively the minimal width representation convention with an ‘infinite’ prefix attached to it.
In the C programming language the machine-width dependent two’s complement
formatting of int data with precision exhibits excessive lengths of prefixes
that arise from negative numbers, even those with small magnitude:
printf("%#.2x\n", -19); // 0xffffffed
printf("%#.2llx\n", (long long unsigned int)-19); // 0xffffffffffffffed
This prefix could continue on indefinitely if it were not limited by a maximum machine-width!
Python’s int type is indeed not limited by a maximum machine-width. Thus to
avoid printing infinitely long two’s complement strings we could use a similar
approach to that of the builtin list’s string formatting for printing a list
that contains itself:
>>> l = []
>>> l.append(l)
>>> l
[[...]]
>>> y = -1
>>> f"{y:z#.8b}"
'0b[...1]11111111'
This may have been useful to educate beginners on how bitwise binary operations
work, for example showing how -1 & x is always trivially equal to x, or
how the binary representation of the negation of a number can be obtained by
adding one to its bitwise complement:
>>> x = 42
>>> f"{x:z#.8b}"
'0b[...0]00101010'
>>> f"{~x:z#.8b}"
'0b[...1]11010101'
>>> f"{x|~x:z#.8b}"
'0b[...1]11111111'
# x | ~x == -1
# x | ~x == x + ~x because of their disjoint bitwise representations
# thus x + ~x == -1
# thus -x == ~x + 1
>>> y = ~x + 1
>>> f"{y:z#.8b}"
'0b[...1]11010110'
>>> y == -x
True
Its use case is just too narrow, and modulo-precision outshines it.
General
- What about ones’s complement, or other binary representations?
Two’s complement is so dominant that no one really considers other representations. GCC only supports two’s complement.
- Could we do nothing?
Programmers continue to hobble on using the
widthformat specifier with ad-hoc corrections to mimic precision. This is intolerable, and the rationale of this PEP makes conclusive arguments for the addition and implementation choices of precision.Refusing to implement precision for integer fields using
.reserves.for possible future uses. However in the ~20 year timespan since PEP 3101 no alternatives have been accepted, and any alternate use of.takes it further out of sync with both old-style%formatting, and the C programming language.
Syntax
!instead ofz.for precision with modulo-precision, mutually exclusive with..Pros:
!is graphically related to., an extension if you will. Precision with the modulo-precision flag set is indeed an extension of precision.!in the English language is often used for imperative, commanding sentences. So too modulo-precision commands the exact number of digits to which its input shall be formatted, whereas precision is the minimum number of digits. This is idiomatic.!is only one symbol as opposed toz.. This coupled with!being mutually exclusive with.leaves the overall length of one’s written code unaffected when switching on modulo-precision.- Using a new
!symbol reserveszfor other future uses, whatever that may be.
Cons:
z.also conveys a sense of extension from., a flag attached to., and lexicographically flows left to right as ‘modulo’ (z) ‘precision’ (.)..and!being mutually exclusive to each other may give a beginner programmer analysis-paralysis over which to choose when looking at the format specification documentation.!would be another addition to the format specification for a single purpose. It would not have any implementation forstr,float, or any other type.- There also already exists a
["!" conversion]“explicit conversion flag” in the format string syntax as laid out in PEP 3101. For example inf"{s!r}"the!rcallsreprons. This would not syntactically clash with a!format specifier, the format specifiers[":" format_spec]being separated by a well-defined preceding colon, however users unfamiliar with the new modulo-precision mode may glance over format strings containing!and expect different behavior.
Verdict:
- Whilst graphically attractive,
!would clutter the format specification for a single purpose that can be achieved by overloading the preexistingzflag.
Backwards Compatibility
To quote PEP 682:
The new formatting behavior is opt-in, so numerical formatting of existing programs will not be affected.
unless someone out there is specifically relying upon . raising a ValueError
for integers as it currently does, but to quote PEP 475:
The authors of this PEP don’t think that such applications exist
Examples And Teaching
Precision
Documentation and tutorials in the Python sphere of influence should encourage
the adoption of ., precision, as the default format specifier for formatting
int fields as opposed to width, when it is clear a minimum number of digits
is required, not a minimum length of the whole replacement field.
Since the concept of precision is common in other languages such as C, and was
already present in Python’s old-style % formatting, we don’t need to go too
overboard, but a decent few examples as below may demonstrate its uses.
>>> def hexdump(b: bytes) -> str:
... return " ".join(f"{c:#.2x}" for c in b)
>>> hexdump(b"GET /\r\n\r\n")
'0x47 0x45 0x54 0x20 0x2f 0x0d 0x0a 0x0d 0x0a'
# observe the CR and LF bytes padded to precision 2
# in this basic HTTP/0.9 request
>>> def unicode_dump(s: str) -> str:
... return " ".join(f"U+{ord(c):.4X}" for c in s)
>>> unicode_dump("USA 🦅")
'U+0055 U+0053 U+0041 U+0020 U+1F985'
# observe the last character's Unicode codepoint has 5 digits;
# precision is only the minimum number of digits
Modulo-Precision
The clear area for encouraging the use of modulo-precision is when dealing with
machine-width oriented integers such as those packed and unpacked by struct.
We give an example of the consistent predictable two’s complement formatting of
signed and unsigned integers.
>>> import struct
>>> my_struct = b"\xff"
>>> (t,) = struct.unpack('b', my_struct) # signed char
>>> print(t, f"{t:#.2x}", f"{t:z#.2x}")
'-1 -0x01 0xff'
>>> (t,) = struct.unpack('B', my_struct) # unsigned char
>>> print(t, f"{t:#.2x}", f"{t:z#.2x}")
'255 0xff 0xff'
# observe in both the signed and unsigned unpacking the modulo-precision flag 'z'
# produces a predictable two's complement formatting
Reference Implementation
A pull request implementing this PEP is available on GitHub: python/cpython#146437.
Thanks
Thank you to:
- Raymond Hettinger, for the initial suggestion of the two’s complement behavior.
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Source: https://github.com/python/peps/blob/main/peps/pep-0786.rst
Last modified: 2026-04-09 09:40:11 GMT