PEP: 357 Title: Allowing Any Object to be Used for Slicing Author:
Travis Oliphant <oliphant@ee.byu.edu> Status: Final Type: Standards
Track Content-Type: text/x-rst Created: 09-Feb-2006 Python-Version: 2.5
Post-History:

Abstract

This PEP proposes adding an nb_index slot in PyNumberMethods and an
__index__ special method so that arbitrary objects can be used whenever
integers are explicitly needed in Python, such as in slice syntax (from
which the slot gets its name).

Rationale

Currently integers and long integers play a special role in slicing in
that they are the only objects allowed in slice syntax. In other words,
if X is an object implementing the sequence protocol, then X[obj1:obj2]
is only valid if obj1 and obj2 are both integers or long integers. There
is no way for obj1 and obj2 to tell Python that they could be reasonably
used as indexes into a sequence. This is an unnecessary limitation.

In NumPy, for example, there are 8 different integer scalars
corresponding to unsigned and signed integers of 8, 16, 32, and 64 bits.
These type-objects could reasonably be used as integers in many places
where Python expects true integers but cannot inherit from the Python
integer type because of incompatible memory layouts. There should be
some way to be able to tell Python that an object can behave like an
integer.

It is not possible to use the nb_int (and __int__ special method) for
this purpose because that method is used to coerce objects to integers.
It would be inappropriate to allow every object that can be coerced to
an integer to be used as an integer everywhere Python expects a true
integer. For example, if __int__ were used to convert an object to an
integer in slicing, then float objects would be allowed in slicing and
x[3.2:5.8] would not raise an error as it should.

Proposal

Add an nb_index slot to PyNumberMethods, and a corresponding __index__
special method. Objects could define a function to place in the nb_index
slot that returns a Python integer (either an int or a long). This
integer can then be appropriately converted to a Py_ssize_t value
whenever Python needs one such as in PySequence_GetSlice,
PySequence_SetSlice, and PySequence_DelSlice.

Specification

1)  The nb_index slot will have the following signature:

        PyObject *index_func (PyObject *self)

    The returned object must be a Python IntType or Python LongType.
    NULL should be returned on error with an appropriate error set.

2)  The __index__ special method will have the signature:

        def __index__(self):
            return obj

    where obj must be either an int or a long.

3)  3 new abstract C-API functions will be added

    a)  The first checks to see if the object supports the index slot
        and if it is filled in.

            int PyIndex_Check(obj)

        This will return true if the object defines the nb_index slot.

    b)  The second is a simple wrapper around the nb_index call that
        raises PyExc_TypeError if the call is not available or if it
        doesn't return an int or long. Because the PyIndex_Check is
        performed inside the PyNumber_Index call you can call it
        directly and manage any error rather than check for
        compatibility first.

            PyObject *PyNumber_Index (PyObject *obj)

    c)  The third call helps deal with the common situation of actually
        needing a Py_ssize_t value from the object to use for indexing
        or other needs.

            Py_ssize_t PyNumber_AsSsize_t(PyObject *obj, PyObject *exc)

        The function calls the nb_index slot of obj if it is available
        and then converts the returned Python integer into a Py_ssize_t
        value. If this goes well, then the value is returned. The second
        argument allows control over what happens if the integer
        returned from nb_index cannot fit into a Py_ssize_t value.

        If exc is NULL, then the returned value will be clipped to
        PY_SSIZE_T_MAX or PY_SSIZE_T_MIN depending on whether the
        nb_index slot of obj returned a positive or negative integer. If
        exc is non-NULL, then it is the error object that will be set to
        replace the PyExc_OverflowError that was raised when the Python
        integer or long was converted to Py_ssize_t.

4)  A new operator.index(obj) function will be added that calls
    equivalent of obj.__index__() and raises an error if obj does not
    implement the special method.

Implementation Plan

1)  Add the nb_index slot in object.h and modify typeobject.c to create
    the __index__ method
2)  Change the ISINT macro in ceval.c to ISINDEX and alter it to
    accommodate objects with the index slot defined.
3)  Change the _PyEval_SliceIndex function to accommodate objects with
    the index slot defined.
4)  Change all builtin objects (e.g. lists) that use the as_mapping
    slots for subscript access and use a special-check for integers to
    check for the slot as well.
5)  Add the nb_index slot to integers and long_integers (which just
    return themselves)
6)  Add PyNumber_Index C-API to return an integer from any Python Object
    that has the nb_index slot.
7)  Add the operator.index(x) function.
8)  Alter arrayobject.c and mmapmodule.c to use the new C-API for their
    sub-scripting and other needs.
9)  Add unit-tests

Discussion Questions

Speed

Implementation should not slow down Python because integers and long
integers used as indexes will complete in the same number of
instructions. The only change will be that what used to generate an
error will now be acceptable.

Why not use nb_int which is already there?

The nb_int method is used for coercion and so means something
fundamentally different than what is requested here. This PEP proposes a
method for something that can already be thought of as an integer
communicate that information to Python when it needs an integer. The
biggest example of why using nb_int would be a bad thing is that float
objects already define the nb_int method, but float objects should not
be used as indexes in a sequence.

Why the name __index__?

Some questions were raised regarding the name __index__ when other
interpretations of the slot are possible. For example, the slot can be
used any time Python requires an integer internally (such as in
"mystring" * 3). The name was suggested by Guido because slicing syntax
is the biggest reason for having such a slot and in the end no better
name emerged. See the discussion thread[1] for examples of names that
were suggested such as "__discrete__" and "__ordinal__".

Why return PyObject * from nb_index?

Initially Py_ssize_t was selected as the return type for the nb_index
slot. However, this led to an inability to track and distinguish
overflow and underflow errors without ugly and brittle hacks. As the
nb_index slot is used in at least 3 different ways in the Python core
(to get an integer, to get a slice end-point, and to get a sequence
index), there is quite a bit of flexibility needed to handle all these
cases. The importance of having the necessary flexibility to handle all
the use cases is critical. For example, the initial implementation that
returned Py_ssize_t for nb_index led to the discovery that on a 32-bit
machine with >=2GB of RAM s = 'x' * (2**100) works but len(s) was
clipped at 2147483647. Several fixes were suggested but eventually it
was decided that nb_index needed to return a Python Object similar to
the nb_int and nb_long slots in order to handle overflow correctly.

Why can't __index__ return any object with the nb_index method?

This would allow infinite recursion in many different ways that are not
easy to check for. This restriction is similar to the requirement that
__nonzero__ return an int or a bool.

Reference Implementation

Submitted as patch 1436368 to SourceForge.

References

Copyright

This document is placed in the public domain.

[1] Travis Oliphant, PEP for adding an sq_index slot so that any object,
a or b, can be used in X[a:b] notation,

https://mail.python.org/pipermail/python-dev/2006-February/thread.html#60594