PEP: 637 Title: Support for indexing with keyword arguments Version:
$Revision$ Last-Modified: $Date$ Author: Stefano Borini Sponsor: Steven
D'Aprano Discussions-To: python-ideas@python.org Status: Rejected Type:
Standards Track Content-Type: text/x-rst Created: 24-Aug-2020
Python-Version: 3.10 Post-History: 23-Sep-2020 Resolution:
https://mail.python.org/archives/list/python-dev@python.org/thread/6TAQ2BEVSJNV4JM2RJYSSYFJUT3INGZD/

Note

This PEP has been rejected. In general, the cost of introducing new
syntax was not outweighed by the perceived benefits. See the link in the
Resolution header field for details.

Abstract

At present keyword arguments are allowed in function calls, but not in
item access. This PEP proposes that Python be extended to allow keyword
arguments in item access.

The following example shows keyword arguments for ordinary function
calls:

    >>> val = f(1, 2, a=3, b=4)

The proposal would extend the syntax to allow a similar construct to
indexing operations:

    >>> val = x[1, 2, a=3, b=4]  # getitem
    >>> x[1, 2, a=3, b=4] = val  # setitem
    >>> del x[1, 2, a=3, b=4]    # delitem

and would also provide appropriate semantics. Single- and double-star
unpacking of arguments is also provided:

    >>> val = x[*(1, 2), **{a=3, b=4}]  # Equivalent to above.

This PEP is a successor to PEP 472, which was rejected due to lack of
interest in 2019. Since then there's been renewed interest in the
feature.

Overview

Background

PEP 472 was opened in 2014. The PEP detailed various use cases and was
created by extracting implementation strategies from a broad discussion
on the python-ideas mailing list, although no clear consensus was
reached on which strategy should be used. Many corner cases have been
examined more closely and felt awkward, backward incompatible or both.

The PEP was eventually rejected in 2019[1] mostly due to lack of
interest for the feature despite its 5 years of existence.

However, with the introduction of type hints in PEP 484 the square
bracket notation has been used consistently to enrich the typing
annotations, e.g. to specify a list of integers as Sequence[int].
Additionally, there has been an expanded growth of packages for data
analysis such as pandas and xarray, which use names to describe columns
in a table (pandas) or axis in an nd-array (xarray). These packages
allow users to access specific data by names, but cannot currently use
index notation ([]) for this functionality.

As a result, a renewed interest in a more flexible syntax that would
allow for named information has been expressed occasionally in many
different threads on python-ideas, recently by Caleb Donovick[2] in 2019
and Andras Tantos[3] in 2020. These requests prompted a strong activity
on the python-ideas mailing list, where the various options have been
re-discussed and a general consensus on an implementation strategy has
now been reached.

Use cases

The following practical use cases present different cases where a
keyword specification would improve notation and provide additional
value:

1.  To provide a more communicative meaning to the index, preventing
    e.g. accidental inversion of indexes:

        >>> grid_position[x=3, y=5, z=8]
        >>> rain_amount[time=0:12, location=location]
        >>> matrix[row=20, col=40]

2.  To enrich the typing notation with keywords, especially during the
    use of generics:

        def function(value: MyType[T=int]):

3.  In some domain, such as computational physics and chemistry, the use
    of a notation such as Basis[Z=5] is a Domain Specific Language
    notation to represent a level of accuracy:

        >>> low_accuracy_energy = computeEnergy(molecule, BasisSet[Z=3])

4.  Pandas currently uses a notation such as:

        >>> df[df['x'] == 1]

    which could be replaced with df[x=1].

5.  xarray has named dimensions. Currently these are handled with
    functions .isel:

        >>> data.isel(row=10)  # Returns the tenth row

    which could also be replaced with data[row=10]. A more complex
    example:

        >>> # old syntax
        >>> da.isel(space=0, time=slice(None, 2))[...] = spam
        >>> # new syntax
        >>> da[space=0, time=:2] = spam

    Another example:

        >>> # old syntax
        >>> ds["empty"].loc[dict(lon=5, lat=6)] = 10
        >>> # new syntax
        >>> ds["empty"][lon=5, lat=6] = 10

        >>> # old syntax
        >>> ds["empty"].loc[dict(lon=slice(1, 5), lat=slice(3, None))] = 10
        >>> # new syntax
        >>> ds["empty"][lon=1:5, lat=6:] = 10

6.  Functions/methods whose argument is another function (plus its
    arguments) need some way to determine which arguments are destined
    for the target function, and which are used to configure how they
    run the target. This is simple (if non-extensible) for positional
    parameters, but we need some way to distinguish these for
    keywords.[4]

    An indexed notation would afford a Pythonic way to pass keyword
    arguments to these functions without cluttering the caller's code.

        >>> # Let's start this example with basic syntax without keywords.
        >>> # the positional values are arguments to `func` while
        >>> # `name=` is processed by `trio.run`.
        >>> trio.run(func, value1, value2, name="func")
        >>> # `trio.run` ends up calling `func(value1, value2)`.

        >>> # If we want/need to pass value2 by keyword (keyword-only argument,
        >>> # additional arguments that won't break backwards compatibility ...),
        >>> # currently we need to resort to functools.partial:
        >>> trio.run(functools.partial(func, param2=value2), value1, name="func")
        >>> trio.run(functools.partial(func, value1, param2=value2), name="func")

        >>> # One possible workaround is to convert `trio.run` to an object
        >>> # with a `__call__` method, and use an "option" helper,
        >>> trio.run.option(name="func")(func, value1, param2=value2)
        >>> # However, foo(bar)(baz) is uncommon and thus disruptive to the reader.
        >>> # Also, you need to remember the name of the `option` method.

        >>> # This PEP allows us to replace `option` with `__getitem__`.
        >>> # The call is now shorter, more mnemonic, and looks+works like typing
        >>> trio.run[name="func"](func, value1, param2=value2)

7.  Availability of star arguments would benefit PEP 646 Variadic
    Generics, especially in the forms a[*x] and a[*x, *y, p, q, *z]. The
    PEP details exactly this notation in its "Unpacking: Star Operator"
    section.

It is important to note that how the notation is interpreted is up to
the implementation. This PEP only defines and dictates the behavior of
Python regarding passed keyword arguments, not how these arguments
should be interpreted and used by the implementing class.

Current status of indexing operation

Before detailing the new syntax and semantics to the indexing notation,
it is relevant to analyse how the indexing notation works today, in
which contexts, and how it is different from a function call.

Subscripting obj[x] is, effectively, an alternate and specialised form
of function call syntax with a number of differences and restrictions
compared to obj(x). The current Python syntax focuses exclusively on
position to express the index, and also contains syntactic sugar to
refer to non-punctiform selection (slices). Some common examples:

    >>> a[3]       # returns the fourth element of 'a'
    >>> a[1:10:2]  # slice notation (extract a non-trivial data subset)
    >>> a[3, 2]    # multiple indexes (for multidimensional arrays)

This translates into a __(get|set|del)item__ dunder call which is passed
a single parameter containing the index (for __getitem__ and
__delitem__) or two parameters containing index and value (for
__setitem__).

The behavior of the indexing call is fundamentally different from a
function call in various aspects:

The first difference is in meaning to the reader. A function call says
"arbitrary function call potentially with side-effects". An indexing
operation says "lookup", typically to point at a subset or specific
sub-aspect of an entity (as in the case of typing notation). This
fundamental difference means that, while we cannot prevent abuse,
implementors should be aware that the introduction of keyword arguments
to alter the behavior of the lookup may violate this intrinsic meaning.

The second difference of the indexing notation compared to a function is
that indexing can be used for both getting and setting operations. In
Python, a function cannot be on the left hand side of an assignment. In
other words, both of these are valid:

    >>> x = a[1, 2]
    >>> a[1, 2] = 5

but only the first one of these is valid:

    >>> x = f(1, 2)
    >>> f(1, 2) = 5  # invalid

This asymmetry is important, and makes one understand that there is a
natural imbalance between the two forms. It is therefore not a given
that the two should behave transparently and symmetrically.

The third difference is that functions have names assigned to their
arguments, unless the passed parameters are captured with *args, in
which case they end up as entries in the args tuple. In other words,
functions already have anonymous argument semantic, exactly like the
indexing operation. However, __(get|set|del)item__ is not always
receiving a tuple as the index argument (to be uniform in behavior with
*args). In fact, given a trivial class:

    class X:
        def __getitem__(self, index):
            print(index)

The index operation basically forwards the content of the square
brackets "as is" in the index argument:

    >>> x=X()
    >>> x[0]
    0
    >>> x[0, 1]
    (0, 1)
    >>> x[(0, 1)]
    (0, 1)
    >>>
    >>> x[()]
    ()
    >>> x[{1, 2, 3}]
    {1, 2, 3}
    >>> x["hello"]
    hello
    >>> x["hello", "hi"]
    ('hello', 'hi')

The fourth difference is that the indexing operation knows how to
convert colon notations to slices, thanks to support from the parser.
This is valid:

    a[1:3]

this one isn't:

    f(1:3)

The fifth difference is that there's no zero-argument form. This is
valid:

    f()

this one isn't:

    a[]

Specification

Before describing the specification, it is important to stress the
difference in nomenclature between positional index, final index and
keyword argument, as it is important to understand the fundamental
asymmetries at play. The __(get|set|del)item__ is fundamentally an
indexing operation, and the way the element is retrieved, set, or
deleted is through an index, the final index.

The current status quo is to directly build the final index from what is
passed between square brackets, the positional index. In other words,
what is passed in the square brackets is trivially used to generate what
the code in __getitem__ then uses for the indicisation operation. As we
already saw for the dict, d[1] has a positional index of 1 and also a
final index of 1 (because it's the element that is then added to the
dictionary) and d[1, 2] has positional index of (1, 2) and final index
also of (1, 2) (because yet again it's the element that is added to the
dictionary). However, the positional index d[1,2:3] is not accepted by
the dictionary, because there's no way to transform the positional index
into a final index, as the slice object is unhashable. The positional
index is what is currently known as the index parameter in __getitem__.
Nevertheless, nothing prevents to construct a dictionary-like class that
creates the final index by e.g. converting the positional index to a
string.

This PEP extends the current status quo, and grants more flexibility to
create the final index via an enhanced syntax that combines the
positional index and keyword arguments, if passed.

The above brings an important point across. Keyword arguments, in the
context of the index operation, may be used to take indexing decisions
to obtain the final index, and therefore will have to accept values that
are unconventional for functions. See for example use case 1, where a
slice is accepted.

The successful implementation of this PEP will result in the following
behavior:

1.  An empty subscript is still illegal, regardless of context (see
    Rejected Ideas):

        obj[]  # SyntaxError

2.  A single index value remains a single index value when passed:

        obj[index]
        # calls type(obj).__getitem__(obj, index)

        obj[index] = value
        # calls type(obj).__setitem__(obj, index, value)

        del obj[index]
        # calls type(obj).__delitem__(obj, index)

    This remains the case even if the index is followed by keywords; see
    point 5 below.

3.  Comma-separated arguments are still parsed as a tuple and passed as
    a single positional argument:

        obj[spam, eggs]
        # calls type(obj).__getitem__(obj, (spam, eggs))

        obj[spam, eggs] = value
        # calls type(obj).__setitem__(obj, (spam, eggs), value)

        del obj[spam, eggs]
        # calls type(obj).__delitem__(obj, (spam, eggs))

    The points above mean that classes which do not want to support
    keyword arguments in subscripts need do nothing at all, and the
    feature is therefore completely backwards compatible.

4.  Keyword arguments, if any, must follow positional arguments:

        obj[1, 2, spam=None, 3]  # SyntaxError

    This is like function calls, where intermixing positional and
    keyword arguments give a SyntaxError.

5.  Keyword subscripts, if any, will be handled like they are in
    function calls. Examples:

        # Single index with keywords:

        obj[index, spam=1, eggs=2]
        # calls type(obj).__getitem__(obj, index, spam=1, eggs=2)

        obj[index, spam=1, eggs=2] = value
        # calls type(obj).__setitem__(obj, index, value, spam=1, eggs=2)

        del obj[index, spam=1, eggs=2]
        # calls type(obj).__delitem__(obj, index, spam=1, eggs=2)

        # Comma-separated indices with keywords:

        obj[foo, bar, spam=1, eggs=2]
        # calls type(obj).__getitem__(obj, (foo, bar), spam=1, eggs=2)

        obj[foo, bar, spam=1, eggs=2] = value
        # calls type(obj).__setitem__(obj, (foo, bar), value, spam=1, eggs=2)

        del obj[foo, bar, spam=1, eggs=2]
        # calls type(obj).__detitem__(obj, (foo, bar), spam=1, eggs=2)

    Note that:

    -   a single positional index will not turn into a tuple just
        because one adds a keyword value.
    -   for __setitem__, the same order is retained for index and value.
        The keyword arguments go at the end, as is normal for a function
        definition.

6.  The same rules apply with respect to keyword subscripts as for
    keywords in function calls:

    -   the interpreter matches up each keyword subscript to a named
        parameter in the appropriate method;
    -   if a named parameter is used twice, that is an error;
    -   if there are any named parameters left over (without a value)
        when the keywords are all used, they are assigned their default
        value (if any);
    -   if any such parameter doesn't have a default, that is an error;
    -   if there are any keyword subscripts remaining after all the
        named parameters are filled, and the method has a **kwargs
        parameter, they are bound to the **kwargs parameter as a dict;
    -   but if no **kwargs parameter is defined, it is an error.

7.  Sequence unpacking is allowed inside subscripts:

        obj[*items]

    This allows notations such as [:, *args, :], which could be treated
    as [(slice(None), *args, slice(None))]. Multiple star unpacking are
    allowed:

        obj[1, *(2, 3), *(4, 5), 6, foo=5]
        # Equivalent to obj[(1, 2, 3, 4, 5, 6), foo=3)

    The following notation equivalence must be honored:

        obj[*()]        
        # Equivalent to obj[()]

        obj[*(), foo=3] 
        # Equivalent to obj[(), foo=3]

        obj[*(x,)]      
        # Equivalent to obj[(x,)]

        obj[*(x,),]     
        # Equivalent to obj[(x,)]

    Note in particular case 3: sequence unpacking of a single element
    will not behave as if only one single argument was passed. A related
    case is the following example:

        obj[1, *(), foo=5]
        # Equivalent to obj[(1,), foo=5]
        # calls type(obj).__getitem__(obj, (1,), foo=5)

    However, as we saw earlier, for backward compatibility a single
    index will be passed as is:

        obj[1, foo=5]
        # calls type(obj).__getitem__(obj, 1, foo=5)

    In other words, a single positional index will be passed "as is"
    only if no sequence unpacking is present. If a sequence unpacking is
    present, then the index will become a tuple, regardless of the
    resulting number of elements in the index after the unpacking has
    taken place.

8.  Dict unpacking is permitted:

        items = {'spam': 1, 'eggs': 2}
        obj[index, **items]
        # equivalent to obj[index, spam=1, eggs=2]

    The following notation equivalent should be honored:

        obj[**{}]    
        # Equivalent to obj[()]

        obj[3, **{}] 
        # Equivalent to obj[3]

9.  Keyword-only subscripts are permitted. The positional index will be
    the empty tuple:

        obj[spam=1, eggs=2]
        # calls type(obj).__getitem__(obj, (), spam=1, eggs=2)

        obj[spam=1, eggs=2] = 5
        # calls type(obj).__setitem__(obj, (), 5, spam=1, eggs=2)

        del obj[spam=1, eggs=2]
        # calls type(obj).__delitem__(obj, (), spam=1, eggs=2)

    The choice of the empty tuple as a sentinel has been debated.
    Details are provided in the Rejected Ideas section.

10. Keyword arguments must allow slice syntax:

        obj[3:4, spam=1:4, eggs=2]
        # calls type(obj).__getitem__(obj, slice(3, 4, None), spam=slice(1, 4, None), eggs=2)

    This may open up the possibility to accept the same syntax for
    general function calls, but this is not part of this recommendation.

11. Keyword arguments allow for default values:

        # Given type(obj).__getitem__(obj, index, spam=True, eggs=2)
        obj[3]               # Valid. index = 3, spam = True, eggs = 2
        obj[3, spam=False]   # Valid. index = 3, spam = False, eggs = 2
        obj[spam=False]      # Valid. index = (), spam = False, eggs = 2
        obj[]                # Invalid.

12. The same semantics given above must be extended to
    __class__getitem__: Since PEP 560, type hints are dispatched so that
    for x[y], if no __getitem__ method is found, and x is a type (class)
    object, and x has a class method __class_getitem__, that method is
    called. The same changes should be applied to this method as well,
    so that a writing like list[T=int] can be accepted.

Indexing behavior in standard classes (dict, list, etc.)

None of what is proposed in this PEP will change the behavior of the
current core classes that use indexing. Adding keywords to the index
operation for custom classes is not the same as modifying e.g. the
standard dict type to handle keyword arguments. In fact, dict (as well
as list and other stdlib classes with indexing semantics) will remain
the same and will continue not to accept keyword arguments. In other
words, if d is a dict, the statement d[1, a=2] will raise TypeError, as
their implementation will not support the use of keyword arguments. The
same holds for all other classes (list, dict, etc.)

Corner case and Gotchas

With the introduction of the new notation, a few corner cases need to be
analysed.

1.  Technically, if a class defines their getter like this:

        def __getitem__(self, index):

    then the caller could call that using keyword syntax, like these two
    cases:

        obj[3, index=4]
        obj[index=1]

    The resulting behavior would be an error automatically, since it
    would be like attempting to call the method with two values for the
    index argument, and a TypeError will be raised. In the first case,
    the index would be 3, in the second case, it would be the empty
    tuple ().

    Note that this behavior applies for all currently existing classes
    that rely on indexing, meaning that there is no way for the new
    behavior to introduce backward compatibility issues on this respect.

    Classes that wish to stress this behavior explicitly can define
    their parameters as positional-only:

        def __getitem__(self, index, /):

2.  a similar case occurs with setter notation:

        # Given type(obj).__setitem__(obj, index, value):
        obj[1, value=3] = 5

    This poses no issue because the value is passed automatically, and
    the Python interpreter will raise
    TypeError: got multiple values for keyword argument 'value'

3.  If the subscript dunders are declared to use positional-or-keyword
    parameters, there may be some surprising cases when arguments are
    passed to the method. Given the signature:

        def __getitem__(self, index, direction='north')

    if the caller uses this:

        obj[0, 'south']

    they will probably be surprised by the method call:

        # expected type(obj).__getitem__(obj, 0, direction='south')
        # but actually get:
        type(obj).__getitem__(obj, (0, 'south'), direction='north')

    Solution: best practice suggests that keyword subscripts should be
    flagged as keyword-only when possible:

        def __getitem__(self, index, *, direction='north')

    The interpreter need not enforce this rule, as there could be
    scenarios where this is the desired behaviour. But linters may
    choose to warn about subscript methods which don't use the
    keyword-only flag.

4.  As we saw, a single value followed by a keyword argument will not be
    changed into a tuple, i.e.: d[1, a=3] is treated as
    __getitem__(d, 1, a=3), NOT __getitem__(d, (1,), a=3). It would be
    extremely confusing if adding keyword arguments were to change the
    type of the passed index. In other words, adding a keyword to a
    single-valued subscript will not change it into a tuple. For those
    cases where an actual tuple needs to be passed, a proper syntax will
    have to be used:

        obj[(1,), a=3]  
        # calls type(obj).__getitem__(obj, (1,), a=3)

    In this case, the call is passing a single element (which is passed
    as is, as from rule above), only that the single element happens to
    be a tuple.

    Note that this behavior just reveals the truth that the obj[1,]
    notation is shorthand for obj[(1,)] (and also obj[1] is shorthand
    for obj[(1)], with the expected behavior). When keywords are
    present, the rule that you can omit this outermost pair of
    parentheses is no longer true:

        obj[1]          
        # calls type(obj).__getitem__(obj, 1)

        obj[1, a=3]     
        # calls type(obj).__getitem__(obj, 1, a=3)

        obj[1,]         
        # calls type(obj).__getitem__(obj, (1,))

        obj[(1,), a=3]  
        # calls type(obj).__getitem__(obj, (1,), a=3)

    This is particularly relevant in the case where two entries are
    passed:

        obj[1, 2]
        # calls type(obj).__getitem__(obj, (1, 2))

        obj[(1, 2)]       
        # same as above

        obj[1, 2, a=3]    
        # calls type(obj).__getitem__(obj, (1, 2), a=3)

        obj[(1, 2), a=3]  
        # calls type(obj).__getitem__(obj, (1, 2), a=3)

    And particularly when the tuple is extracted as a variable:

        t = (1, 2)
        obj[t]       
        # calls type(obj).__getitem__(obj, (1, 2))

        obj[t, a=3]  
        # calls type(obj).__getitem__(obj, (1, 2), a=3)

    Why? because in the case obj[1, 2, a=3] we are passing two elements
    (which are then packed as a tuple and passed as the index). In the
    case obj[(1, 2), a=3] we are passing a single element (which is
    passed as is) which happens to be a tuple. The final result is that
    they are the same.

C Interface

Resolution of the indexing operation is performed through a call to the
following functions

-   PyObject_GetItem(PyObject *o, PyObject *key) for the get operation
-   PyObject_SetItem(PyObject *o, PyObject *key, PyObject *value) for
    the set operation
-   PyObject_DelItem(PyObject *o, PyObject *key) for the del operation

These functions are used extensively within the Python executable, and
are also part of the public C API, as exported by Include/abstract.h. It
is clear that the signature of this function cannot be changed, and
different C level functions need to be implemented to support the
extended call. We propose

-   PyObject_GetItemWithKeywords(PyObject *o, PyObject *key, PyObject *kwargs)
-   PyObject_SetItemWithKeywords(PyObject *o, PyObject *key, PyObject *value, PyObject *kwargs)
-   PyObject_GetItemWithKeywords(PyObject *o, PyObject *key, PyObject *kwargs)

New opcodes will be needed for the enhanced call. Currently, the
implementation uses BINARY_SUBSCR, STORE_SUBSCR and DELETE_SUBSCR to
invoke the old functions. We propose BINARY_SUBSCR_KW, STORE_SUBSCR_KW
and DELETE_SUBSCR_KW for the new operations. The compiler will have to
generate these new opcodes. The old C implementations will call the
extended methods passing NULL as kwargs.

Finally, the following new slots must be added to the PyMappingMethods
struct:

-   mp_subscript_kw
-   mp_ass_subscript_kw

These slots will have the appropriate signature to handle the dictionary
object containing the keywords.

"How to teach" recommendations

One request that occurred during feedback sessions was to detail a
possible narrative for teaching the feature, e.g. to students, data
scientists, and similar audience. This section addresses that need.

We will only describe the indexing from the perspective of use, not of
implementation, because it is the aspect that the above mentioned
audience will likely encounter. Only a subset of the users will have to
implement their own dunder functions, and can be considered advanced
usage. A proper explanation could be:

  The indexing operation is generally used to refer to a subset of a
  larger dataset by means of an index. In the commonly seen cases, the
  index is made by one or more numbers, strings, slices, etc.

  Some types may allow indexing to occur not only with the index, but
  also with named values. These named values are given between square
  brackets using the same syntax used for function call keyword
  arguments. The meaning of the names and their use is found in the
  documentation of the type, as it varies from one type to another.

The teacher will now show some practical real world examples, explaining
the semantics of the feature in the shown library. At the time of
writing these examples do not exist, obviously, but the libraries most
likely to implement the feature are pandas and numpy, possibly as a
method to refer to columns by name.

Reference Implementation

A reference implementation is currently being developed here[5].

Workarounds

Every PEP that changes the Python language should "clearly explain why
the existing language specification is inadequate to address the
problem that the PEP solves" <1#what-belongs-in-a-successful-pep>.

Some rough equivalents to the proposed extension, which we call
work-arounds, are already possible. The work-arounds provide an
alternative to enabling the new syntax, while leaving the semantics to
be defined elsewhere.

These work-arounds follow. In them the helpers H and P are not intended
to be universal. For example, a module or package might require the use
of its own helpers.

1.  User defined classes can be given getitem and delitem methods, that
    respectively get and delete values stored in a container:

        >>> val = x.getitem(1, 2, a=3, b=4)
        >>> x.delitem(1, 2, a=3, b=4)

    The same can't be done for setitem. It's not valid syntax:

        >>> x.setitem(1, 2, a=3, b=4) = val
        SyntaxError: can't assign to function call

2.  A helper class, here called H, can be used to swap the container and
    parameter roles. In other words, we use:

        H(1, 2, a=3, b=4)[x]

    as a substitute for:

        x[1, 2, a=3, b=4]

    This method will work for getitem, delitem and also for setitem.
    This is because:

        >>> H(1, 2, a=3, b=4)[x] = val

    is valid syntax, which can be given the appropriate semantics.

3.  A helper function, here called P, can be used to store the arguments
    in a single object. For example:

        >>> x[P(1, 2, a=3, b=4)] = val

    is valid syntax, and can be given the appropriate semantics.

4.  The lo:hi:step syntax for slices is sometimes very useful. This
    syntax is not directly available in the work-arounds. However:

        s[lo:hi:step]

    provides a work-around that is available everything, where:

        class S:
            def __getitem__(self, key): return key

        s = S()

    defines the helper object s.

Rejected Ideas

Previous PEP 472 solutions

PEP 472 presents a good amount of ideas that are now all to be
considered Rejected. A personal email from D'Aprano to the author
specifically said:

  I have now carefully read through PEP 472 in full, and I am afraid I
  cannot support any of the strategies currently in the PEP.

We agree that those options are inferior to the currently presented, for
one reason or another.

To keep this document compact, we will not present here the objections
for all options presented in PEP 472. Suffice to say that they were
discussed, and each proposed alternative had one or few dealbreakers.

Adding new dunders

It was proposed to introduce new dunders __(get|set|del)item_ex__ that
are invoked over the __(get|set|del)item__ triad, if they are present.

The rationale around this choice is to make the intuition around how to
add kwd arg support to square brackets more obvious and in line with the
function behavior. Given:

    def __getitem_ex__(self, x, y): ...

These all just work and produce the same result effortlessly:

    obj[1, 2]
    obj[1, y=2]
    obj[y=2, x=1]

In other words, this solution would unify the behavior of __getitem__ to
the traditional function signature, but since we can't change
__getitem__ and break backward compatibility, we would have an extended
version that is used preferentially.

The problems with this approach were found to be:

-   It will slow down subscripting. For every subscript access, this new
    dunder attribute gets investigated on the class, and if it is not
    present then the default key translation function is executed.
    Different ideas were proposed to handle this, from wrapping the
    method only at class instantiation time, to add a bit flag to signal
    the availability of these methods. Regardess of the solution, the
    new dunder would be effective only if added at class creation time,
    not if it's added later. This would be unusual and would disallow
    (and behave unexpectedly) monkeypatching of the methods for whatever
    reason it might be needed.

-   It adds complexity to the mechanism.

-   Will require a long and painful transition period during which time
    libraries will have to somehow support both calling conventions,
    because most likely, the extended methods will delegate to the
    traditional ones when the right conditions are matched in the
    arguments, or some classes will support the traditional dunder and
    others the extended dunder. While this will not affect calling code,
    it will affect development.

-   it would potentially lead to mixed situations where the extended
    version is defined for the getter, but not for the setter.

-   In the __setitem_ex__ signature, value would have to be made the
    first element, because the index is of arbitrary length depending on
    the specified indexes. This would look awkward because the visual
    notation does not match the signature:

        obj[1, 2] = 3  
        # calls type(obj).__setitem_ex__(obj, 3, 1, 2)

-   the solution relies on the assumption that all keyword indices
    necessarily map into positional indices, or that they must have a
    name. This assumption may be false: xarray, which is the primary
    Python package for numpy arrays with labelled dimensions, supports
    indexing by additional dimensions (so called "non-dimension
    coordinates") that don't correspond directly to the dimensions of
    the underlying numpy array, and those have no position to match up
    to. In other words, anonymous indexes are a plausible use case that
    this solution would remove, although it could be argued that using
    *args would solve that issue.

Adding an adapter function

Similar to the above, in the sense that a pre-function would be called
to convert the "new style" indexing into "old style indexing" that is
then passed. Has problems similar to the above.

create a new "kwslice" object

This proposal has already been explored in "New arguments contents" P4
in PEP 472:

    obj[a, b:c, x=1]  
    # calls type(obj).__getitem__(obj, a, slice(b, c), key(x=1))

This solution requires everyone who needs keyword arguments to parse the
tuple and/or key object by hand to extract them. This is painful and
opens up to the get/set/del function to always accept arbitrary keyword
arguments, whether they make sense or not. We want the developer to be
able to specify which arguments make sense and which ones do not.

Using a single bit to change the behavior

A special class dunder flag:

    __keyfn__ = True

would change the signature of the __get|set|delitem__ to a "function
like" dispatch, meaning that this:

    >>> d[1, 2, z=3]

would result in a call to:

    >>> type(obj).__getitem__(obj, 1, 2, z=3)  
    # instead of type(obj).__getitem__(obj, (1, 2), z=3)

This option has been rejected because it feels odd that a signature of a
method depends on a specific value of another dunder. It would be
confusing for both static type checkers and for humans: a static type
checker would have to hard-code a special case for this, because there
really is nothing else in Python where the signature of a dunder depends
on the value of another dunder. A human that has to implement a
__getitem__ dunder would have to look if in the class (or in any of its
subclasses) for a __keyfn__ before the dunder can be written. Moreover,
adding a base classes that have the __keyfn__ flag set would break the
signature of the current methods. This would be even more problematic if
the flag is changed at runtime, or if the flag is generated by calling a
function that returns randomly True or something else.

Allowing for empty index notation obj[]

The current proposal prevents obj[] from being valid notation. However a
commenter stated

  We have Tuple[int, int] as a tuple of two integers. And we have
  Tuple[int] as a tuple of one integer. And occasionally we need to
  spell a tuple of no values, since that's the type of (). But we
  currently are forced to write that as Tuple[()]. If we allowed Tuple[]
  that odd edge case would be removed.

  So I probably would be okay with allowing obj[] syntactically, as long
  as the dict type could be made to reject it.

This proposal already established that, in case no positional index is
given, the passed value must be the empty tuple. Allowing for the empty
index notation would make the dictionary type accept it automatically,
to insert or refer to the value with the empty tuple as key. Moreover, a
typing notation such as Tuple[] can easily be written as Tuple without
the indexing notation.

However, subsequent discussion with Brandt Bucher during implementation
has revealed that the case obj[] would fit a natural evolution for
variadic generics, giving more strength to the above comment. In the
end, after a discussion between D'Aprano, Bucher and the author, we
decided to leave the obj[] notation as a syntax error for now, and
possibly extend the notation with an additional PEP to hold the
equivalence obj[] as obj[()].

Sentinel value for no given positional index

The topic of which value to pass as the index in the case of:

    obj[k=3]

has been considerably debated.

One apparently rational choice would be to pass no value at all, by
making use of the keyword only argument feature, but unfortunately will
not work well with the __setitem__ dunder, as a positional element for
the value is always passed, and we can't "skip over" the index one
unless we introduce a very weird behavior where the first argument
refers to the index when specified, and to the value when index is not
specified. This is extremely deceiving and error prone.

The above consideration makes it impossible to have a keyword only
dunder, and opens up the question of what entity to pass for the index
position when no index is passed:

    obj[k=3] = 5  
    # would call type(obj).__setitem__(obj, ???, 5, k=3)

A proposed hack would be to let the user specify which entity to use
when an index is not specified, by specifying a default for the index,
but this forces necessarily to also specify a (never going to be used,
as a value is always passed by design) default for the value, as we
can't have non-default arguments after defaulted one:

    def __setitem__(self, index=SENTINEL, value=NEVERUSED, *, k)

which seems ugly, redundant and confusing. We must therefore accept that
some form of sentinel index must be passed by the Python implementation
when the obj[k=3] notation is used. This also means that default
arguments to those parameters are simply never going to be used (but
it's already the case with the current implementation, so no change
there).

Additionally, some classes may want to use **kwargs, instead of a
keyword-only argument, meaning that having a definition like:

    def __setitem__(self, index, value, **kwargs):

and a user that wants to pass a keyword value:

    x[value=1] = 0

expecting a call like:

    type(obj).__setitem__(obj, SENTINEL, 0, **{"value": 1})

will instead accidentally be caught by the named value, producing a
duplicate value error. The user should not be worried about the actual
local names of those two arguments if they are, for all practical
purposes, positional only. Unfortunately, using positional-only values
will ensure this does not happen but it will still not solve the need to
pass both index and value even when the index is not provided. The point
is that the user should not be prevented to use keyword arguments to
refer to a column index, value (or self) just because the class
implementor happens to use those names in the parameter list.

Moreover, we also require the three dunders to behave in the same way:
it would be extremely inconvenient if only __setitem__ were to receive
this sentinel, and __get|delitem__ would not because they can get away
with a signature that allows for no index specification, thus allowing
for a user-specified default index.

Whatever the choice of the sentinel, it will make the following cases
degenerate and thus impossible to differentiate in the dunder:

    obj[k=3]
    obj[SENTINEL, k=3]

The question now shifts to which entity should represent the sentinel:
the options were:

1.  Empty tuple
2.  None
3.  NotImplemented
4.  a new sentinel object (e.g. NoIndex)

For option 1, the call will become:

    type(obj).__getitem__(obj, (), k=3)

therefore making obj[k=3] and obj[(), k=3] degenerate and
indistinguishable.

This option sounds appealing because:

1.  The numpy community was inquired[6], and the general consensus of
    the responses was that the empty tuple felt appropriate.

2.  It shows a parallel with the behavior of *args in a function, when
    no positional arguments are given:

        >>> def foo(*args, **kwargs):
        ...     print(args, kwargs)
        ...
        >>> foo(k=3)
        () {'k': 3}

    Although we do accept the following asymmetry in behavior compared
    to functions when a single value is passed, but that ship has
    sailed:

        >>> foo(5, k=3)
        (5,) {'k': 3}   # for indexing, a plain 5, not a 1-tuple is passed

For option 2, using None, it was objected that NumPy uses it to indicate
inserting a new axis/dimensions (there's a np.newaxis alias as well):

    arr = np.array(5)
    arr.ndim == 0
    arr[None].ndim == arr[None,].ndim == 1

While this is not an insurmountable issue, it certainly will ripple onto
numpy.

The only issues with both the above is that both the empty tuple and
None are potential legitimate indexes, and there might be value in being
able to differentiate the two degenerate cases.

So, an alternative strategy (option 3) would be to use an existing
entity that is unlikely to be used as a valid index. One option could be
the current built-in constant NotImplemented, which is currently
returned by operators methods to report that they do not implement a
particular operation, and a different strategy should be attempted (e.g.
to ask the other object). Unfortunately, its name and traditional use
calls back to a feature that is not available, rather than the fact that
something was not passed by the user.

This leaves us with option 4: a new built-in constant. This constant
must be unhashable (so it's never going to be a valid key) and have a
clear name that makes it obvious its context: NoIndex. This would solve
all the above issues, but the question is: is it worth it?

From a quick inquire, it seems that most people on python-ideas seem to
believe it's not crucial, and the empty tuple is an acceptable option.
Hence the resulting series will be:

    obj[k=3]         
    # type(obj).__getitem__(obj, (), k=3). Empty tuple

    obj[1, k=3]      
    # type(obj).__getitem__(obj, 1, k=3). Integer

    obj[1, 2, k=3]   
    # type(obj).__getitem__(obj, (1, 2), k=3). Tuple

and the following two notation will be degenerate:

    obj[(), k=3]     
    # type(obj).__getitem__(obj, (), k=3)

    obj[k=3]         
    # type(obj).__getitem__(obj, (), k=3)

Common objections

1.  Just use a method call.

    One of the use cases is typing, where the indexing is used
    exclusively, and function calls are out of the question. Moreover,
    function calls do not handle slice notation, which is commonly used
    in some cases for arrays.

    One problem is type hint creation has been extended to built-ins in
    Python 3.9, so that you do not have to import Dict, List, et al
    anymore.

    Without kwdargs inside [], you would not be able to do this:

        Vector = dict[i=float, j=float]

    but for obvious reasons, call syntax using builtins to create custom
    type hints isn't an option:

        dict(i=float, j=float)  
        # would create a dictionary, not a type

    Finally, function calls do not allow for a setitem-like notation, as
    shown in the Overview: operations such as f(1, x=3) = 5 are not
    allowed, and are instead allowed for indexing operations.

References

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 End:

[1] "Rejection of PEP 472"
(https://mail.python.org/pipermail/python-dev/2019-March/156693.html)

[2] "Allow kwargs in __{getdel}item__"
(https://mail.python.org/archives/list/python-ideas@python.org/thread/EUGDRTRFIY36K4RM3QRR52CKCI7MIR2M/)

[3] "PEP 472 -- Support for indexing with keyword arguments"
(https://mail.python.org/archives/list/python-ideas@python.org/thread/6OGAFDWCXT5QVV23OZWKBY4TXGZBVYZS/)

[4] "trio.run() should take **kwargs in addition to *args"
(https://github.com/python-trio/trio/issues/470)

[5] "Reference implementation"
(https://github.com/python/cpython/compare/master...stefanoborini:PEP-637-implementation-attempt-2)

[6] "[Numpy-discussion] Request for comments on PEP 637 - Support for
indexing with keyword arguments"
(http://numpy-discussion.10968.n7.nabble.com/Request-for-comments-on-PEP-637-Support-for-indexing-with-keyword-arguments-td48489.html)