PEP: 331 Title: Locale-Independent Float/String Conversions Version:
$Revision$ Last-Modified: $Date$ Author: Christian R. Reis <kiko at
async.com.br> Status: Final Type: Standards Track Content-Type:
text/x-rst Created: 19-Jul-2003 Python-Version: 2.4 Post-History:
21-Jul-2003, 13-Aug-2003, 18-Jun-2004

Abstract

Support for the LC_NUMERIC locale category in Python 2.3 is implemented
only in Python-space. This causes inconsistent behavior and
thread-safety issues for applications that use extension modules and
libraries implemented in C that parse and generate floats from strings.
This document proposes a plan for removing this inconsistency by
providing and using substitute locale-agnostic functions as necessary.

Introduction

Python provides generic localization services through the locale module,
which among other things allows localizing the display and conversion
process of numeric types. Locale categories, such as LC_TIME and
LC_COLLATE, allow configuring precisely what aspects of the application
are to be localized.

The LC_NUMERIC category specifies formatting for non-monetary numeric
information, such as the decimal separator in float and fixed-precision
numbers. Localization of the LC_NUMERIC category is currently
implemented only in Python-space; C libraries invoked from the Python
runtime are unaware of Python's LC_NUMERIC setting. This is done to
avoid changing the behavior of certain low-level functions that are used
by the Python parser and related code[1].

However, this presents a problem for extension modules that wrap C
libraries. Applications that use these extension modules will
inconsistently display and convert floating-point values.

James Henstridge, the author of PyGTK[2], has additionally pointed out
that the setlocale() function also presents thread-safety issues, since
a thread may call the C library setlocale() outside of the GIL, and
cause Python to parse and generate floats incorrectly.

Rationale

The inconsistency between Python and C library localization for
LC_NUMERIC is a problem for any localized application using C
extensions. The exact nature of the problem will vary depending on the
application, but it will most likely occur when parsing or formatting a
floating-point value.

Example Problem

The initial problem that motivated this PEP is related to the
GtkSpinButton[3] widget in the GTK+ UI toolkit, wrapped by the PyGTK
module. The widget can be set to numeric mode, and when this occurs,
characters typed into it are evaluated as a number.

Problems occur when LC_NUMERIC is set to a locale with a float separator
that differs from the C locale's standard (for instance, ',' instead of
'.' for the Brazilian locale pt_BR). Because LC_NUMERIC is not set at
the libc level, float values are displayed incorrectly (using '.' as a
separator) in the spinbutton's text entry, and it is impossible to enter
fractional values using the ',' separator.

This small example demonstrates reduced usability for localized
applications using this toolkit when coded in Python.

Proposal

Martin v. Löwis commented on the initial constraints for an acceptable
solution to the problem on python-dev:

-   LC_NUMERIC can be set at the C library level without breaking the
    parser.
-   float() and str() stay locale-unaware.
-   locale-aware str() and atof() stay in the locale module.

An analysis of the Python source suggests that the following functions
currently depend on LC_NUMERIC being set to the C locale:

-   Python/compile.c:parsenumber()
-   Python/marshal.c:r_object()
-   Objects/complexobject.c:complex_to_buf()
-   Objects/complexobject.c:complex_subtype_from_string()
-   Objects/floatobject.c:PyFloat_FromString()
-   Objects/floatobject.c:format_float()
-   Objects/stringobject.c:formatfloat()
-   Modules/stropmodule.c:strop_atof()
-   Modules/cPickle.c:load_float()

The proposed approach is to implement LC_NUMERIC-agnostic functions for
converting from (strtod()/atof()) and to (snprintf()) float formats,
using these functions where the formatting should not vary according to
the user-specified locale.

The locale module should also be changed to remove the special-casing
for LC_NUMERIC.

This change should also solve the aforementioned thread-safety problems.

Potential Code Contributions

This problem was initially reported as a problem in the GTK+
libraries[4]; since then it has been correctly diagnosed as an
inconsistency in Python's implementation. However, in a fortunate
coincidence, the glib library (developed primarily for GTK+, not to be
confused with the GNU C library) implements a number of
LC_NUMERIC-agnostic functions (for an example, see[5]) for reasons
similar to those presented in this paper.

In the same GTK+ problem report, Havoc Pennington suggested that the
glib authors would be willing to contribute this code to the PSF, which
would simplify implementation of this PEP considerably. Alex Larsson,
the original author of the glib code, submitted a PSF Contributor
Agreement[6] on 2003-08-20[7] to ensure the code could be safely
integrated; this agreement has been received and accepted.

Risks

There may be cross-platform issues with the provided locale-agnostic
functions, though this risk is low given that the code supplied simply
reverses any locale-dependent changes made to floating-point numbers.

Martin and Guido pointed out potential copyright issues with the
contributed code. I believe we will have no problems in this area as
members of the GTK+ and glib teams have said they are fine with
relicensing the code, and a PSF contributor agreement has been mailed in
to ensure this safety.

Tim Peters has pointed out[8] that there are situations involving
threading in which the proposed change is insufficient to solve the
problem completely. A complete solution, however, does not currently
exist.

Implementation

An implementation was developed by Gustavo Carneiro <gjc at
inescporto.pt>, and attached to Sourceforge.net bug 774665[9]

The final patch[10] was integrated into Python CVS by Martin v. Löwis on
2004-06-08, as stated in the bug report.

References

Copyright

This document has been placed in the public domain.

[1] Python locale documentation for embedding,
http://docs.python.org/library/locale.html

[2] PyGTK homepage, http://www.daa.com.au/~james/pygtk/

[3] GtkSpinButton screenshot (demonstrating problem),
http://www.async.com.br/~kiko/spin.png

[4] GNOME bug report, http://bugzilla.gnome.org/show_bug.cgi?id=114132

[5] Code submission of g_ascii_strtod and g_ascii_dtostr (later renamed
g_ascii_formatd) by Alex Larsson,
http://mail.gnome.org/archives/gtk-devel-list/2001-October/msg00114.html

[6] PSF Contributor Agreement,
https://www.python.org/psf/contrib/contrib-form/

[7] Alex Larsson's email confirming his agreement was mailed in,
https://mail.python.org/pipermail/python-dev/2003-August/037755.html

[8] Tim Peters' email summarizing LC_NUMERIC trouble with Spambayes,
https://mail.python.org/pipermail/python-dev/2003-September/037898.html

[9] Python bug report, https://bugs.python.org/issue774665

[10] Integrated LC_NUMERIC-agnostic patch,
https://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=89685&aid=774665