PEP: 560 Title: Core support for typing module and generic types Author:
Ivan Levkivskyi <levkivskyi@gmail.com> Status: Final Type: Standards
Track Topic: Typing Created: 03-Sep-2017 Python-Version: 3.7
Post-History: 09-Sep-2017, 14-Nov-2017 Resolution:
https://mail.python.org/pipermail/python-dev/2017-December/151038.html

the documentation for ~object.__class_getitem__ and
~object.__mro_entries__

Abstract

Initially PEP 484 was designed in such way that it would not introduce
any changes to the core CPython interpreter. Now type hints and the
typing module are extensively used by the community, e.g. PEP 526 and
PEP 557 extend the usage of type hints, and the backport of typing on
PyPI has 1M downloads/month. Therefore, this restriction can be removed.
It is proposed to add two special methods __class_getitem__ and
__mro_entries__ to the core CPython for better support of generic types.

Rationale

The restriction to not modify the core CPython interpreter led to some
design decisions that became questionable when the typing module started
to be widely used. There are three main points of concern: performance
of the typing module, metaclass conflicts, and the large number of hacks
currently used in typing.

Performance

The typing module is one of the heaviest and slowest modules in the
standard library even with all the optimizations made. Mainly this is
because subscripted generic types (see PEP 484 for definition of terms
used in this PEP) are class objects (see also[1]). There are three main
ways how the performance can be improved with the help of the proposed
special methods:

-   Creation of generic classes is slow since the GenericMeta.__new__ is
    very slow; we will not need it anymore.
-   Very long method resolution orders (MROs) for generic classes will
    be half as long; they are present because we duplicate the
    collections.abc inheritance chain in typing.
-   Instantiation of generic classes will be faster (this is minor
    however).

Metaclass conflicts

All generic types are instances of GenericMeta, so if a user uses a
custom metaclass, then it is hard to make a corresponding class generic.
This is particularly hard for library classes that a user doesn't
control. A workaround is to always mix-in GenericMeta:

    class AdHocMeta(GenericMeta, LibraryMeta):
        pass

    class UserClass(LibraryBase, Generic[T], metaclass=AdHocMeta):
        ...

but this is not always practical or even possible. With the help of the
proposed special attributes the GenericMeta metaclass will not be
needed.

Hacks and bugs that will be removed by this proposal

-   _generic_new hack that exists because __init__ is not called on
    instances with a type differing from the type whose __new__ was
    called, C[int]().__class__ is C.
-   _next_in_mro speed hack will be not necessary since subscription
    will not create new classes.
-   Ugly sys._getframe hack. This one is particularly nasty since it
    looks like we can't remove it without changes outside typing.
-   Currently generics do dangerous things with private ABC caches to
    fix large memory consumption that grows at least as O(N²), see[2].
    This point is also important because it was recently proposed to
    re-implement ABCMeta in C.
-   Problems with sharing attributes between subscripted generics,
    see[3]. The current solution already uses __getattr__ and
    __setattr__, but it is still incomplete, and solving this without
    the current proposal will be hard and will need __getattribute__.
-   _no_slots_copy hack, where we clean up the class dictionary on every
    subscription thus allowing generics with __slots__.
-   General complexity of the typing module. The new proposal will not
    only allow to remove the above-mentioned hacks/bugs, but also
    simplify the implementation, so that it will be easier to maintain.

Specification

__class_getitem__

The idea of __class_getitem__ is simple: it is an exact analog of
__getitem__ with an exception that it is called on a class that defines
it, not on its instances. This allows us to avoid
GenericMeta.__getitem__ for things like Iterable[int]. The
__class_getitem__ is automatically a class method and does not require
@classmethod decorator (similar to __init_subclass__) and is inherited
like normal attributes. For example:

    class MyList:
        def __getitem__(self, index):
            return index + 1
        def __class_getitem__(cls, item):
            return f"{cls.__name__}[{item.__name__}]"

    class MyOtherList(MyList):
        pass

    assert MyList()[0] == 1
    assert MyList[int] == "MyList[int]"

    assert MyOtherList()[0] == 1
    assert MyOtherList[int] == "MyOtherList[int]"

Note that this method is used as a fallback, so if a metaclass defines
__getitem__, then that will have the priority.

__mro_entries__

If an object that is not a class object appears in the tuple of bases of
a class definition, then method __mro_entries__ is searched on it. If
found, it is called with the original tuple of bases as an argument. The
result of the call must be a tuple, that is unpacked in the base classes
in place of this object. (If the tuple is empty, this means that the
original bases is simply discarded.) If there are more than one object
with __mro_entries__, then all of them are called with the same original
tuple of bases. This step happens first in the process of creation of a
class, all other steps, including checks for duplicate bases and MRO
calculation, happen normally with the updated bases.

Using the method API instead of just an attribute is necessary to avoid
inconsistent MRO errors, and perform other manipulations that are
currently done by GenericMeta.__new__. The original bases are stored as
__orig_bases__ in the class namespace (currently this is also done by
the metaclass). For example:

    class GenericAlias:
        def __init__(self, origin, item):
            self.origin = origin
            self.item = item
        def __mro_entries__(self, bases):
            return (self.origin,)

    class NewList:
        def __class_getitem__(cls, item):
            return GenericAlias(cls, item)

    class Tokens(NewList[int]):
        ...

    assert Tokens.__bases__ == (NewList,)
    assert Tokens.__orig_bases__ == (NewList[int],)
    assert Tokens.__mro__ == (Tokens, NewList, object)

Resolution using __mro_entries__ happens only in bases of a class
definition statement. In all other situations where a class object is
expected, no such resolution will happen, this includes isinstance and
issubclass built-in functions.

NOTE: These two method names are reserved for use by the typing module
and the generic types machinery, and any other use is discouraged. The
reference implementation (with tests) can be found in[4], and the
proposal was originally posted and discussed on the typing tracker,
see[5].

Dynamic class creation and types.resolve_bases

type.__new__ will not perform any MRO entry resolution. So that a direct
call type('Tokens', (List[int],), {}) will fail. This is done for
performance reasons and to minimize the number of implicit
transformations. Instead, a helper function resolve_bases will be added
to the types module to allow an explicit __mro_entries__ resolution in
the context of dynamic class creation. Correspondingly, types.new_class
will be updated to reflect the new class creation steps while
maintaining the backwards compatibility:

    def new_class(name, bases=(), kwds=None, exec_body=None):
        resolved_bases = resolve_bases(bases)  # This step is added
        meta, ns, kwds = prepare_class(name, resolved_bases, kwds)
        if exec_body is not None:
            exec_body(ns)
        ns['__orig_bases__'] = bases  # This step is added
        return meta(name, resolved_bases, ns, **kwds)

Using __class_getitem__ in C extensions

As mentioned above, __class_getitem__ is automatically a class method if
defined in Python code. To define this method in a C extension, one
should use flags METH_O|METH_CLASS. For example, a simple way to make an
extension class generic is to use a method that simply returns the
original class objects, thus fully erasing the type information at
runtime, and deferring all check to static type checkers only:

    typedef struct {
        PyObject_HEAD
        /* ... your code ... */
    } SimpleGeneric;

    static PyObject *
    simple_class_getitem(PyObject *type, PyObject *item)
    {
        Py_INCREF(type);
        return type;
    }

    static PyMethodDef simple_generic_methods[] = {
        {"__class_getitem__", simple_class_getitem, METH_O|METH_CLASS, NULL},
        /* ... other methods ... */
    };

    PyTypeObject SimpleGeneric_Type = {
        PyVarObject_HEAD_INIT(NULL, 0)
        "SimpleGeneric",
        sizeof(SimpleGeneric),
        0,
        .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,
        .tp_methods = simple_generic_methods,
    };

Such class can be used as a normal generic in Python type annotations (a
corresponding stub file should be provided for static type checkers, see
PEP 484 for details):

    from simple_extension import SimpleGeneric
    from typing import TypeVar

    T = TypeVar('T')

    Alias = SimpleGeneric[str, T]
    class SubClass(SimpleGeneric[T, int]):
        ...

    data: Alias[int]  # Works at runtime
    more_data: SubClass[str]  # Also works at runtime

Backwards compatibility and impact on users who don't use typing

This proposal may break code that currently uses the names
__class_getitem__ and __mro_entries__. (But the language reference
explicitly reserves all undocumented dunder names, and allows "breakage
without warning"; see[6].)

This proposal will support almost complete backwards compatibility with
the current public generic types API; moreover the typing module is
still provisional. The only two exceptions are that currently
issubclass(List[int], List) returns True, while with this proposal it
will raise TypeError, and repr() of unsubscripted user-defined generics
cannot be tweaked and will coincide with repr() of normal (non-generic)
classes.

With the reference implementation I measured negligible performance
effects (under 1% on a micro-benchmark) for regular (non-generic)
classes. At the same time performance of generics is significantly
improved:

-   importlib.reload(typing) is up to 7x faster
-   Creation of user defined generic classes is up to 4x faster (on a
    micro-benchmark with an empty body)
-   Instantiation of generic classes is up to 5x faster (on a
    micro-benchmark with an empty __init__)
-   Other operations with generic types and instances (like method
    lookup and isinstance() checks) are improved by around 10-20%
-   The only aspect that gets slower with the current proof of concept
    implementation is the subscripted generics cache look-up. However it
    was already very efficient, so this aspect gives negligible overall
    impact.

References

Copyright

This document has been placed in the public domain.

[1] Discussion following Mark Shannon's presentation at Language Summit
(https://github.com/python/typing/issues/432)

[2] Pull Request to implement shared generic ABC caches (merged)
(https://github.com/python/typing/pull/383)

[3] An old bug with setting/accessing attributes on generic types
(https://github.com/python/typing/issues/392)

[4] The reference implementation
(https://github.com/ilevkivskyi/cpython/pull/2/files,
https://github.com/ilevkivskyi/cpython/tree/new-typing)

[5] Original proposal (https://github.com/python/typing/issues/468)

[6] Reserved classes of identifiers
(https://docs.python.org/3/reference/lexical_analysis.html#reserved-classes-of-identifiers)