PEP: 378 Title: Format Specifier for Thousands Separator Version:
$Revision$ Last-Modified: $Date$ Author: Raymond Hettinger
<python@rcn.com> Status: Final Type: Standards Track Content-Type:
text/x-rst Created: 12-Mar-2009 Python-Version: 2.7, 3.1 Post-History:
12-Mar-2009

Motivation

Provide a simple, non-locale aware way to format a number with a
thousands separator.

Adding thousands separators is one of the simplest ways to humanize a
program's output, improving its professional appearance and readability.

In the finance world, output with thousands separators is the norm.
Finance users and non-professional programmers find the locale approach
to be frustrating, arcane and non-obvious.

The locale module presents two other challenges. First, it is a global
setting and not suitable for multi-threaded apps that need to serve-up
requests in multiple locales. Second, the name of a relevant locale
(such as "de_DE") can vary from platform to platform or may not be
defined at all. The docs for the locale module describe these and many
other challenges in detail.

It is not the goal to replace the locale module, to perform
internationalization tasks, or accommodate every possible convention.
Such tasks are better suited to robust tools like Babel. Instead, the
goal is to make a common, everyday task easier for many users.

Main Proposal (from Alyssa Coghlan, originally called Proposal I)

A comma will be added to the format() specifier mini-language:

    [[fill]align][sign][#][0][width][,][.precision][type]

The ',' option indicates that commas should be included in the output as
a thousands separator. As with locales which do not use a period as the
decimal point, locales which use a different convention for digit
separation will need to use the locale module to obtain appropriate
formatting.

The proposal works well with floats, ints, and decimals. It also allows
easy substitution for other separators. For example:

    format(n, "6,d").replace(",", "_")

This technique is completely general but it is awkward in the one case
where the commas and periods need to be swapped:

    format(n, "6,f").replace(",", "X").replace(".", ",").replace("X", ".")

The width argument means the total length including the commas and
decimal point:

    format(1234, "08,d")     -->    '0001,234'
    format(1234.5, "08,.1f") -->    '01,234.5'

The ',' option is defined as shown above for types 'd', 'e', 'f', 'g',
'E', 'G', '%', 'F' and ''. To allow future extensions, it is undefined
for other types: binary, octal, hex, character, etc.

This proposal has the virtue of being simpler than the alternative
proposal but is much less flexible and meets the needs of fewer users
right out of the box. It is expected that some other solution will arise
for specifying alternative separators.

Current Version of the Mini-Language

-   Python 2.6 docs
-   PEP 3101 Advanced String Formatting

Research into what Other Languages Do

Scanning the web, I've found that thousands separators are usually one
of COMMA, DOT, SPACE, APOSTROPHE or UNDERSCORE.

C-Sharp provides both styles (picture formatting and type specifiers).
The type specifier approach is locale aware. The picture formatting only
offers a COMMA as a thousands separator:

    String.Format("{0:n}", 12400)     ==>    "12,400"
    String.Format("{0:0,0}", 12400)   ==>    "12,400"

Common Lisp uses a COLON before the ~D decimal type specifier to emit a
COMMA as a thousands separator. The general form of ~D is
~mincol,padchar,commachar,commaintervalD. The padchar defaults to SPACE.
The commachar defaults to COMMA. The commainterval defaults to three.

    (format nil "~:D" 229345007)   =>   "229,345,007"

-   The ADA language allows UNDERSCORES in its numeric literals.

Visual Basic and its brethren (like MS Excel) use a completely different
style and have ultra-flexible custom format specifiers like:

    "_($* #,##0_)".

COBOL uses picture clauses like:

    PICTURE $***,**9.99CR

Java offers a Decimal.Format Class that uses picture patterns (one for
positive numbers and an optional one for negatives) such as:
"#,##0.00;(#,##0.00)". It allows arbitrary groupings including hundreds
and ten-thousands and uneven groupings. The special pattern characters
are non-localized (using a DOT for a decimal separator and a COMMA for a
grouping separator). The user can supply an alternate set of symbols
using the formatter's DecimalFormatSymbols object.

Alternative Proposal (from Eric Smith, originally called Proposal II)

Make both the thousands separator and decimal separator user specifiable
but not locale aware. For simplicity, limit the choices to a COMMA, DOT,
SPACE, APOSTROPHE or UNDERSCORE. The SPACE can be either U+0020 or
U+00A0.

Whenever a separator is followed by a precision, it is a decimal
separator and an optional separator preceding it is a thousands
separator. When the precision is absent, a lone specifier means a
thousands separator:

    [[fill]align][sign][#][0][width][tsep][dsep precision][type]

Examples:

    format(1234, "8.1f")     -->    '  1234.0'
    format(1234, "8,1f")     -->    '  1234,0'
    format(1234, "8.,1f")    -->    ' 1.234,0'
    format(1234, "8 ,f")     -->    ' 1 234,0'
    format(1234, "8d")       -->    '    1234'
    format(1234, "8,d")      -->    '   1,234'
    format(1234, "8_d")      -->    '   1_234'

This proposal meets mosts needs, but it comes at the expense of taking a
bit more effort to parse. Not every possible convention is covered, but
at least one of the options (spaces or underscores) should be readable,
understandable, and useful to folks from many diverse backgrounds.

As shown in the examples, the width argument means the total length
including the thousands separators and decimal separators.

No change is proposed for the locale module.

The thousands separator is defined as shown above for types 'd', 'e',
'f', 'g', '%', 'E', 'G' and 'F'. To allow future extensions, it is
undefined for other types: binary, octal, hex, character, etc.

The drawback to this alternative proposal is the difficulty of mentally
parsing whether a single separator is a thousands separator or decimal
separator. Perhaps it is too arcane to link the decimal separator with
the precision specifier.

Commentary

-   Some commenters do not like the idea of format strings at all and
    find them to be unreadable. Suggested alternatives include the COBOL
    style PICTURE approach or a convenience function with keyword
    arguments for every possible combination.
-   Some newsgroup respondants think there is no place for any scripts
    that are not internationalized and that it is a step backwards to
    provide a simple way to hardwire a particular choice (thus reducing
    incentive to use a locale sensitive approach).
-   Another thought is that embedding some particular convention in
    individual format strings makes it hard to change that convention
    later. No workable alternative was suggested but the general idea is
    to set the convention once and have it apply everywhere (others
    commented that locale already provides a way to do this).
-   There are some precedents for grouping digits in the fractional part
    of a floating point number, but this PEP does not venture into that
    territory. Only digits to the left of the decimal point are grouped.
    This does not preclude future extensions; it just focuses on a
    single, generally useful extension to the formatting language.
-   James Knight observed that Indian/Pakistani numbering systems group
    by hundreds. Ben Finney noted that Chinese group by ten-thousands.
    Eric Smith pointed-out that these are already handled by the "n"
    specifier in the locale module (albeit only for integers). This PEP
    does not attempt to support all of those possibilities. It focuses
    on a single, relatively common grouping convention that offers a
    quick way to improve readability in many (though not all) contexts.

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End: