PEP: 237 Title: Unifying Long Integers and Integers Version: $Revision$
Last-Modified: $Date$ Author: Moshe Zadka, Guido van Rossum Status:
Final Type: Standards Track Content-Type: text/x-rst Created:
11-Mar-2001 Python-Version: 2.2 Post-History: 16-Mar-2001, 14-Aug-2001,
23-Aug-2001

Abstract

Python currently distinguishes between two kinds of integers (ints):
regular or short ints, limited by the size of a C long (typically 32 or
64 bits), and long ints, which are limited only by available memory.
When operations on short ints yield results that don't fit in a C long,
they raise an error. There are some other distinctions too. This PEP
proposes to do away with most of the differences in semantics, unifying
the two types from the perspective of the Python user.

Rationale

Many programs find a need to deal with larger numbers after the fact,
and changing the algorithms later is bothersome. It can hinder
performance in the normal case, when all arithmetic is performed using
long ints whether or not they are needed.

Having the machine word size exposed to the language hinders
portability. For examples Python source files and .pyc's are not
portable between 32-bit and 64-bit machines because of this.

There is also the general desire to hide unnecessary details from the
Python user when they are irrelevant for most applications. An example
is memory allocation, which is explicit in C but automatic in Python,
giving us the convenience of unlimited sizes on strings, lists, etc. It
makes sense to extend this convenience to numbers.

It will give new Python programmers (whether they are new to programming
in general or not) one less thing to learn before they can start using
the language.

Implementation

Initially, two alternative implementations were proposed (one by each
author):

1.  The PyInt type's slot for a C long will be turned into a:

        union {
            long i;
            struct {
                unsigned long length;
                digit digits[1];
            } bignum;
        };

    Only the n-1 lower bits of the long have any meaning; the top bit is
    always set. This distinguishes the union. All PyInt functions will
    check this bit before deciding which types of operations to use.

2.  The existing short and long int types remain, but operations return
    a long int instead of raising OverflowError when a result cannot be
    represented as a short int. A new type, integer, may be introduced
    that is an abstract base type of which both the int and long
    implementation types are subclassed. This is useful so that programs
    can check integer-ness with a single test:

        if isinstance(i, integer): ...

After some consideration, the second implementation plan was selected,
since it is far easier to implement, is backwards compatible at the C
API level, and in addition can be implemented partially as a
transitional measure.

Incompatibilities

The following operations have (usually subtly) different semantics for
short and for long integers, and one or the other will have to be
changed somehow. This is intended to be an exhaustive list. If you know
of any other operation that differ in outcome depending on whether a
short or a long int with the same value is passed, please write the
second author.

-   Currently, all arithmetic operators on short ints except << raise
    OverflowError if the result cannot be represented as a short int.
    This will be changed to return a long int instead. The following
    operators can currently raise OverflowError: x+y, x-y, x*y, x**y,
    divmod(x, y), x/y, x%y, and -x. (The last four can only overflow
    when the value -sys.maxint-1 is involved.)
-   Currently, x<<n can lose bits for short ints. This will be changed
    to return a long int containing all the shifted-out bits, if
    returning a short int would lose bits (where changing sign is
    considered a special case of losing bits).
-   Currently, hex and oct literals for short ints may specify negative
    values; for example 0xffffffff == -1 on a 32-bit machine. This will
    be changed to equal 0xffffffffL (2**32-1).
-   Currently, the %u, %x, %X and %o string formatting operators and the
    hex() and oct() built-in functions behave differently for negative
    numbers: negative short ints are formatted as unsigned C long, while
    negative long ints are formatted with a minus sign. This will be
    changed to use the long int semantics in all cases (but without the
    trailing L that currently distinguishes the output of hex() and
    oct() for long ints). Note that this means that %u becomes an alias
    for %d. It will eventually be removed.
-   Currently, repr() of a long int returns a string ending in L while
    repr() of a short int doesn't. The L will be dropped; but not before
    Python 3.0.
-   Currently, an operation with long operands will never return a short
    int. This may change, since it allows some optimization. (No changes
    have been made in this area yet, and none are planned.)
-   The expression type(x).__name__ depends on whether x is a short or a
    long int. Since implementation alternative 2 is chosen, this
    difference will remain. (In Python 3.0, we may be able to deploy a
    trick to hide the difference, because it is annoying to reveal the
    difference to user code, and more so as the difference between the
    two types is less visible.)
-   Long and short ints are handled different by the marshal module, and
    by the pickle and cPickle modules. This difference will remain (at
    least until Python 3.0).
-   Short ints with small values (typically between -1 and 99 inclusive)
    are interned -- whenever a result has such a value, an existing
    short int with the same value is returned. This is not done for long
    ints with the same values. This difference will remain. (Since there
    is no guarantee of this interning, it is debatable whether this is a
    semantic difference -- but code may exist that uses is for
    comparisons of short ints and happens to work because of this
    interning. Such code may fail if used with long ints.)

Literals

A trailing L at the end of an integer literal will stop having any
meaning, and will be eventually become illegal. The compiler will choose
the appropriate type solely based on the value. (Until Python 3.0, it
will force the literal to be a long; but literals without a trailing L
may also be long, if they are not representable as short ints.)

Built-in Functions

The function int() will return a short or a long int depending on the
argument value. In Python 3.0, the function long() will call the
function int(); before then, it will continue to force the result to be
a long int, but otherwise work the same way as int(). The built-in name
long will remain in the language to represent the long implementation
type (unless it is completely eradicated in Python 3.0), but using the
int() function is still recommended, since it will automatically return
a long when needed.

C API

The C API remains unchanged; C code will still need to be aware of the
difference between short and long ints. (The Python 3.0 C API will
probably be completely incompatible.)

The PyArg_Parse*() APIs already accept long ints, as long as they are
within the range representable by C ints or longs, so that functions
taking C int or long argument won't have to worry about dealing with
Python longs.

Transition

There are three major phases to the transition:

1.  Short int operations that currently raise OverflowError return a
    long int value instead. This is the only change in this phase.
    Literals will still distinguish between short and long ints. The
    other semantic differences listed above (including the behavior of
    <<) will remain. Because this phase only changes situations that
    currently raise OverflowError, it is assumed that this won't break
    existing code. (Code that depends on this exception would have to be
    too convoluted to be concerned about it.) For those concerned about
    extreme backwards compatibility, a command line option (or a call to
    the warnings module) will allow a warning or an error to be issued
    at this point, but this is off by default.
2.  The remaining semantic differences are addressed. In all cases the
    long int semantics will prevail. Since this will introduce backwards
    incompatibilities which will break some old code, this phase may
    require a future statement and/or warnings, and a prolonged
    transition phase. The trailing L will continue to be used for longs
    as input and by repr().
    A.  Warnings are enabled about operations that will change their
        numeric outcome in stage 2B, in particular hex() and oct(), %u,
        %x, %X and %o, hex and oct literals in the (inclusive) range
        [sys.maxint+1, sys.maxint*2+1], and left shifts losing bits.
    B.  The new semantic for these operations are implemented.
        Operations that give different results than before will not
        issue a warning.
3.  The trailing L is dropped from repr(), and made illegal on input.
    (If possible, the long type completely disappears.) The trailing L
    is also dropped from hex() and oct().

Phase 1 will be implemented in Python 2.2.

Phase 2 will be implemented gradually, with 2A in Python 2.3 and 2B in
Python 2.4.

Phase 3 will be implemented in Python 3.0 (at least two years after
Python 2.4 is released).

OverflowWarning

Here are the rules that guide warnings generated in situations that
currently raise OverflowError. This applies to transition phase 1.
Historical note: despite that phase 1 was completed in Python 2.2, and
phase 2A in Python 2.3, nobody noticed that OverflowWarning was still
generated in Python 2.3. It was finally disabled in Python 2.4. The
Python builtin OverflowWarning, and the corresponding C API
PyExc_OverflowWarning, are no longer generated or used in Python 2.4,
but will remain for the (unlikely) case of user code until Python 2.5.

-   A new warning category is introduced, OverflowWarning. This is a
    built-in name.

-   If an int result overflows, an OverflowWarning warning is issued,
    with a message argument indicating the operation, e.g. "integer
    addition". This may or may not cause a warning message to be
    displayed on sys.stderr, or may cause an exception to be raised, all
    under control of the -W command line and the warnings module.

-   The OverflowWarning warning is ignored by default.

-   The OverflowWarning warning can be controlled like all warnings, via
    the -W command line option or via the warnings.filterwarnings()
    call. For example:

        python -Wdefault::OverflowWarning

    cause the OverflowWarning to be displayed the first time it occurs
    at a particular source line, and:

        python -Werror::OverflowWarning

    cause the OverflowWarning to be turned into an exception whenever it
    happens. The following code enables the warning from inside the
    program:

        import warnings
        warnings.filterwarnings("default", "", OverflowWarning)

    See the python man page for the -W option and the warnings module
    documentation for filterwarnings().

-   If the OverflowWarning warning is turned into an error,
    OverflowError is substituted. This is needed for backwards
    compatibility.

-   Unless the warning is turned into an exceptions, the result of the
    operation (e.g., x+y) is recomputed after converting the arguments
    to long ints.

Example

If you pass a long int to a C function or built-in operation that takes
an integer, it will be treated the same as a short int as long as the
value fits (by virtue of how PyArg_ParseTuple() is implemented). If the
long value doesn't fit, it will still raise an OverflowError. For
example:

    def fact(n):
        if n <= 1:
        return 1
    return n*fact(n-1)

    A = "ABCDEFGHIJKLMNOPQ"
    n = input("Gimme an int: ")
    print A[fact(n)%17]

For n >= 13, this currently raises OverflowError (unless the user enters
a trailing L as part of their input), even though the calculated index
would always be in range(17). With the new approach this code will do
the right thing: the index will be calculated as a long int, but its
value will be in range.

Resolved Issues

These issues, previously open, have been resolved.

-   hex() and oct() applied to longs will continue to produce a trailing
    L until Python 3000. The original text above wasn't clear about
    this, but since it didn't happen in Python 2.4 it was thought better
    to leave it alone. BDFL pronouncement here:

    https://mail.python.org/pipermail/python-dev/2006-June/065918.html

-   What to do about sys.maxint? Leave it in, since it is still relevant
    whenever the distinction between short and long ints is still
    relevant (e.g. when inspecting the type of a value).

-   Should we remove %u completely? Remove it.

-   Should we warn about << not truncating integers? Yes.

-   Should the overflow warning be on a portable maximum size? No.

Implementation

The implementation work for the Python 2.x line is completed; phase 1
was released with Python 2.2, phase 2A with Python 2.3, and phase 2B
will be released with Python 2.4 (and is already in CVS).

Copyright

This document has been placed in the public domain.