PEP: 204 Title: Range Literals Author: Thomas Wouters
<thomas@python.org> Status: Rejected Type: Standards Track Created:
14-Jul-2000 Python-Version: 2.0 Post-History:

After careful consideration, and a period of meditation, this proposal
has been rejected. The open issues, as well as some confusion between
ranges and slice syntax, raised enough questions for Guido not to accept
it for Python 2.0, and later to reject the proposal altogether. The new
syntax and its intentions were deemed not obvious enough.

[ TBD: Guido, amend/confirm this, please. Preferably both; this is a
PEP, it should contain all the reasons for rejection and/or
reconsideration, for future reference. ]

Introduction

This PEP describes the "range literal" proposal for Python 2.0. This PEP
tracks the status and ownership of this feature, slated for introduction
in Python 2.0. It contains a description of the feature and outlines
changes necessary to support the feature. This PEP summarizes
discussions held in mailing list forums, and provides URLs for further
information, where appropriate. The CVS revision history of this file
contains the definitive historical record.

List ranges

Ranges are sequences of numbers of a fixed stepping, often used in
for-loops. The Python for-loop is designed to iterate over a sequence
directly:

    >>> l = ['a', 'b', 'c', 'd']
    >>> for item in l:
    ...     print item
    a
    b
    c
    d

However, this solution is not always prudent. Firstly, problems arise
when altering the sequence in the body of the for-loop, resulting in the
for-loop skipping items. Secondly, it is not possible to iterate over,
say, every second element of the sequence. And thirdly, it is sometimes
necessary to process an element based on its index, which is not readily
available in the above construct.

For these instances, and others where a range of numbers is desired,
Python provides the range builtin function, which creates a list of
numbers. The range function takes three arguments, start, end and step.
start and step are optional, and default to 0 and 1, respectively.

The range function creates a list of numbers, starting at start, with a
step of step, up to, but not including end, so that range(10) produces a
list that has exactly 10 items, the numbers 0 through 9.

Using the range function, the above example would look like this:

    >>> for i in range(len(l)):
    ...     print l[i]
    a
    b
    c
    d

Or, to start at the second element of l and processing only every second
element from then on:

    >>> for i in range(1, len(l), 2):
    ...     print l[i]
    b
    d

There are several disadvantages with this approach:

-   Clarity of purpose: Adding another function call, possibly with
    extra arithmetic to determine the desired length and step of the
    list, does not improve readability of the code. Also, it is possible
    to "shadow" the builtin range function by supplying a local or
    global variable with the same name, effectively replacing it. This
    may or may not be a desired effect.
-   Efficiency: because the range function can be overridden, the Python
    compiler cannot make assumptions about the for-loop, and has to
    maintain a separate loop counter.
-   Consistency: There already is a syntax that is used to denote
    ranges, as shown below. This syntax uses the exact same arguments,
    though all optional, in the exact same way. It seems logical to
    extend this syntax to ranges, to form "range literals".

Slice Indices

In Python, a sequence can be indexed in one of two ways: retrieving a
single item, or retrieving a range of items. Retrieving a range of items
results in a new object of the same type as the original sequence,
containing zero or more items from the original sequence. This is done
using a "range notation":

    >>> l[2:4]
    ['c', 'd']

This range notation consists of zero, one or two indices separated by a
colon. The first index is the start index, the second the end. When
either is left out, they default to respectively the start and the end
of the sequence.

There is also an extended range notation, which incorporates step as
well. Though this notation is not currently supported by most builtin
types, if it were, it would work as follows:

    >>> l[1:4:2]
    ['b', 'd']

The third "argument" to the slice syntax is exactly the same as the step
argument to range(). The underlying mechanisms of the standard, and
these extended slices, are sufficiently different and inconsistent that
many classes and extensions outside of mathematical packages do not
implement support for the extended variant. While this should be
resolved, it is beyond the scope of this PEP.

Extended slices do show, however, that there is already a perfectly
valid and applicable syntax to denote ranges in a way that solve all of
the earlier stated disadvantages of the use of the range() function:

-   It is clearer, more concise syntax, which has already proven to be
    both intuitive and easy to learn.
-   It is consistent with the other use of ranges in Python (e.g.
    slices).
-   Because it is built-in syntax, instead of a builtin function, it
    cannot be overridden. This means both that a viewer can be certain
    about what the code does, and that an optimizer will not have to
    worry about range() being "shadowed".

The Proposed Solution

The proposed implementation of range-literals combines the syntax for
list literals with the syntax for (extended) slices, to form range
literals:

    >>> [1:10]
    [1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> [:5]
    [0, 1, 2, 3, 4]
    >>> [5:1:-1]
    [5, 4, 3, 2]

There is one minor difference between range literals and the slice
syntax: though it is possible to omit all of start, end and step in
slices, it does not make sense to omit end in range literals. In slices,
end would default to the end of the list, but this has no meaning in
range literals.

Reference Implementation

The proposed implementation can be found on SourceForge[1]. It adds a
new bytecode, BUILD_RANGE, that takes three arguments from the stack and
builds a list on the bases of those. The list is pushed back on the
stack.

The use of a new bytecode is necessary to be able to build ranges based
on other calculations, whose outcome is not known at compile time.

The code introduces two new functions to listobject.c, which are
currently hovering between private functions and full-fledged API calls.

PyList_FromRange() builds a list from start, end and step, returning
NULL if an error occurs. Its prototype is:

    PyObject * PyList_FromRange(long start, long end, long step)

PyList_GetLenOfRange() is a helper function used to determine the length
of a range. Previously, it was a static function in bltinmodule.c, but
is now necessary in both listobject.c and bltinmodule.c (for xrange). It
is made non-static solely to avoid code duplication. Its prototype is:

    long PyList_GetLenOfRange(long start, long end, long step)

Open issues

-   One possible solution to the discrepancy of requiring the end
    argument in range literals is to allow the range syntax to create a
    "generator", rather than a list, such as the xrange builtin function
    does. However, a generator would not be a list, and it would be
    impossible, for instance, to assign to items in the generator, or
    append to it.

    The range syntax could conceivably be extended to include tuples
    (i.e. immutable lists), which could then be safely implemented as
    generators. This may be a desirable solution, especially for large
    number arrays: generators require very little in the way of storage
    and initialization, and there is only a small performance impact in
    calculating and creating the appropriate number on request. (TBD: is
    there any at all? Cursory testing suggests equal performance even in
    the case of ranges of length 1)

    However, even if idea was adopted, would it be wise to "special
    case" the second argument, making it optional in one instance of the
    syntax, and non-optional in other cases ?

-   Should it be possible to mix range syntax with normal list literals,
    creating a single list? E.g.:

        >>> [5, 6, 1:6, 7, 9]

    to create:

        [5, 6, 1, 2, 3, 4, 5, 7, 9]

-   How should range literals interact with another proposed new
    feature, "list comprehensions" <202>? Specifically, should it be
    possible to create lists in list comprehensions? E.g.:

        >>> [x:y for x in (1, 2) y in (3, 4)]

    Should this example return a single list with multiple ranges:

        [1, 2, 1, 2, 3, 2, 2, 3]

    Or a list of lists, like so:

        [[1, 2], [1, 2, 3], [2], [2, 3]]

    However, as the syntax and semantics of list comprehensions are
    still subject of hot debate, these issues are probably best
    addressed by the "list comprehensions" PEP.

-   Range literals accept objects other than integers: it performs
    PyInt_AsLong() on the objects passed in, so as long as the objects
    can be coerced into integers, they will be accepted. The resulting
    list, however, is always composed of standard integers.

    Should range literals create a list of the passed-in type? It might
    be desirable in the cases of other builtin types, such as longs and
    strings:

        >>> [ 1L : 2L<<64 : 2<<32L ]
        >>> ["a":"z":"b"]
        >>> ["a":"z":2]

    However, this might be too much "magic" to be obvious. It might also
    present problems with user-defined classes: even if the base class
    can be found and a new instance created, the instance may require
    additional arguments to __init__, causing the creation to fail.

-   The PyList_FromRange() and PyList_GetLenOfRange() functions need to
    be classified: are they part of the API, or should they be made
    private functions?

Copyright

This document has been placed in the Public Domain.

References

[1] http://sourceforge.net/patch/?func=detailpatch&patch_id=100902&group_id=5470