PEP: 252 Title: Making Types Look More Like Classes Author: Guido van
Rossum <guido@python.org> Status: Final Type: Standards Track
Content-Type: text/x-rst Created: 19-Apr-2001 Python-Version: 2.2
Post-History:

Abstract

This PEP proposes changes to the introspection API for types that makes
them look more like classes, and their instances more like class
instances. For example, type(x) will be equivalent to x.__class__ for
most built-in types. When C is x.__class__, x.meth(a) will generally be
equivalent to C.meth(x, a), and C.__dict__ contains x's methods and
other attributes.

This PEP also introduces a new approach to specifying attributes, using
attribute descriptors, or descriptors for short. Descriptors unify and
generalize several different common mechanisms used for describing
attributes: a descriptor can describe a method, a typed field in the
object structure, or a generalized attribute represented by getter and
setter functions.

Based on the generalized descriptor API, this PEP also introduces a way
to declare class methods and static methods.

[Editor's note: the ideas described in this PEP have been incorporated
into Python. The PEP no longer accurately describes the implementation.]

Introduction

One of Python's oldest language warts is the difference between classes
and types. For example, you can't directly subclass the dictionary type,
and the introspection interface for finding out what methods and
instance variables an object has is different for types and for classes.

Healing the class/type split is a big effort, because it affects many
aspects of how Python is implemented. This PEP concerns itself with
making the introspection API for types look the same as that for
classes. Other PEPs will propose making classes look more like types,
and subclassing from built-in types; these topics are not on the table
for this PEP.

Introspection APIs

Introspection concerns itself with finding out what attributes an object
has. Python's very general getattr/setattr API makes it impossible to
guarantee that there always is a way to get a list of all attributes
supported by a specific object, but in practice two conventions have
appeared that together work for almost all objects. I'll call them the
class-based introspection API and the type-based introspection API;
class API and type API for short.

The class-based introspection API is used primarily for class instances;
it is also used by Jim Fulton's ExtensionClasses. It assumes that all
data attributes of an object x are stored in the dictionary x.__dict__,
and that all methods and class variables can be found by inspection of
x's class, written as x.__class__. Classes have a __dict__ attribute,
which yields a dictionary containing methods and class variables defined
by the class itself, and a __bases__ attribute, which is a tuple of base
classes that must be inspected recursively. Some assumptions here are:

-   attributes defined in the instance dict override attributes defined
    by the object's class;
-   attributes defined in a derived class override attributes defined in
    a base class;
-   attributes in an earlier base class (meaning occurring earlier in
    __bases__) override attributes in a later base class.

(The last two rules together are often summarized as the left-to-right,
depth-first rule for attribute search. This is the classic Python
attribute lookup rule. Note that PEP 253 will propose to change the
attribute lookup order, and if accepted, this PEP will follow suit.)

The type-based introspection API is supported in one form or another by
most built-in objects. It uses two special attributes, __members__ and
__methods__. The __methods__ attribute, if present, is a list of method
names supported by the object. The __members__ attribute, if present, is
a list of data attribute names supported by the object.

The type API is sometimes combined with a __dict__ that works the same
as for instances (for example for function objects in Python 2.1,
f.__dict__ contains f's dynamic attributes, while f.__members__ lists
the names of f's statically defined attributes).

Some caution must be exercised: some objects don't list their
"intrinsic" attributes (like __dict__ and __doc__) in __members__, while
others do; sometimes attribute names occur both in __members__ or
__methods__ and as keys in __dict__, in which case it's anybody's guess
whether the value found in __dict__ is used or not.

The type API has never been carefully specified. It is part of Python
folklore, and most third party extensions support it because they follow
examples that support it. Also, any type that uses Py_FindMethod()
and/or PyMember_Get() in its tp_getattr handler supports it, because
these two functions special-case the attribute names __methods__ and
__members__, respectively.

Jim Fulton's ExtensionClasses ignore the type API, and instead emulate
the class API, which is more powerful. In this PEP, I propose to phase
out the type API in favor of supporting the class API for all types.

One argument in favor of the class API is that it doesn't require you to
create an instance in order to find out which attributes a type
supports; this in turn is useful for documentation processors. For
example, the socket module exports the SocketType object, but this
currently doesn't tell us what methods are defined on socket objects.
Using the class API, SocketType would show exactly what the methods for
socket objects are, and we can even extract their docstrings, without
creating a socket. (Since this is a C extension module, the
source-scanning approach to docstring extraction isn't feasible in this
case.)

Specification of the class-based introspection API

Objects may have two kinds of attributes: static and dynamic. The names
and sometimes other properties of static attributes are knowable by
inspection of the object's type or class, which is accessible through
obj.__class__ or type(obj). (I'm using type and class interchangeably; a
clumsy but descriptive term that fits both is "meta-object".)

(XXX static and dynamic are not great terms to use here, because
"static" attributes may actually behave quite dynamically, and because
they have nothing to do with static class members in C++ or Java. Barry
suggests to use immutable and mutable instead, but those words already
have precise and different meanings in slightly different contexts, so I
think that would still be confusing.)

Examples of dynamic attributes are instance variables of class
instances, module attributes, etc. Examples of static attributes are the
methods of built-in objects like lists and dictionaries, and the
attributes of frame and code objects (f.f_code, c.co_filename, etc.).
When an object with dynamic attributes exposes these through its
__dict__ attribute, __dict__ is a static attribute.

The names and values of dynamic properties are typically stored in a
dictionary, and this dictionary is typically accessible as obj.__dict__.
The rest of this specification is more concerned with discovering the
names and properties of static attributes than with dynamic attributes;
the latter are easily discovered by inspection of obj.__dict__.

In the discussion below, I distinguish two kinds of objects: regular
objects (like lists, ints, functions) and meta-objects. Types and
classes are meta-objects. Meta-objects are also regular objects, but
we're mostly interested in them because they are referenced by the
__class__ attribute of regular objects (or by the __bases__ attribute of
other meta-objects).

The class introspection API consists of the following elements:

-   the __class__ and __dict__ attributes on regular objects;
-   the __bases__ and __dict__ attributes on meta-objects;
-   precedence rules;
-   attribute descriptors.

Together, these not only tell us about all attributes defined by a
meta-object, but they also help us calculate the value of a specific
attribute of a given object.

1.  The __dict__ attribute on regular objects

    A regular object may have a __dict__ attribute. If it does, this
    should be a mapping (not necessarily a dictionary) supporting at
    least __getitem__(), keys(), and has_key(). This gives the dynamic
    attributes of the object. The keys in the mapping give attribute
    names, and the corresponding values give their values.

    Typically, the value of an attribute with a given name is the same
    object as the value corresponding to that name as a key in the
    __dict__. In other words, obj.__dict__['spam'] is obj.spam. (But see
    the precedence rules below; a static attribute with the same name
    may override the dictionary item.)

2.  The __class__ attribute on regular objects

    A regular object usually has a __class__ attribute. If it does, this
    references a meta-object. A meta-object can define static attributes
    for the regular object whose __class__ it is. This is normally done
    through the following mechanism:

3.  The __dict__ attribute on meta-objects

    A meta-object may have a __dict__ attribute, of the same form as the
    __dict__ attribute for regular objects (a mapping but not
    necessarily a dictionary). If it does, the keys of the meta-object's
    __dict__ are names of static attributes for the corresponding
    regular object. The values are attribute descriptors; we'll explain
    these later. An unbound method is a special case of an attribute
    descriptor.

    Because a meta-object is also a regular object, the items in a
    meta-object's __dict__ correspond to attributes of the meta-object;
    however, some transformation may be applied, and bases (see below)
    may define additional dynamic attributes. In other words, mobj.spam
    is not always mobj.__dict__['spam']. (This rule contains a loophole
    because for classes, if C.__dict__['spam'] is a function, C.spam is
    an unbound method object.)

4.  The __bases__ attribute on meta-objects

    A meta-object may have a __bases__ attribute. If it does, this
    should be a sequence (not necessarily a tuple) of other
    meta-objects, the bases. An absent __bases__ is equivalent to an
    empty sequence of bases. There must never be a cycle in the
    relationship between meta-objects defined by __bases__ attributes;
    in other words, the __bases__ attributes define a directed acyclic
    graph, with arcs pointing from derived meta-objects to their base
    meta-objects. (It is not necessarily a tree, since multiple classes
    can have the same base class.) The __dict__ attributes of a
    meta-object in the inheritance graph supply attribute descriptors
    for the regular object whose __class__ attribute points to the root
    of the inheritance tree (which is not the same as the root of the
    inheritance hierarchy -- rather more the opposite, at the bottom
    given how inheritance trees are typically drawn). Descriptors are
    first searched in the dictionary of the root meta-object, then in
    its bases, according to a precedence rule (see the next paragraph).

5.  Precedence rules

    When two meta-objects in the inheritance graph for a given regular
    object both define an attribute descriptor with the same name, the
    search order is up to the meta-object. This allows different
    meta-objects to define different search orders. In particular,
    classic classes use the old left-to-right depth-first rule, while
    new-style classes use a more advanced rule (see the section on
    method resolution order in PEP 253).

    When a dynamic attribute (one defined in a regular object's
    __dict__) has the same name as a static attribute (one defined by a
    meta-object in the inheritance graph rooted at the regular object's
    __class__), the static attribute has precedence if it is a
    descriptor that defines a __set__ method (see below); otherwise (if
    there is no __set__ method) the dynamic attribute has precedence. In
    other words, for data attributes (those with a __set__ method), the
    static definition overrides the dynamic definition, but for other
    attributes, dynamic overrides static.

    Rationale: we can't have a simple rule like "static overrides
    dynamic" or "dynamic overrides static", because some static
    attributes indeed override dynamic attributes; for example, a key
    '__class__' in an instance's __dict__ is ignored in favor of the
    statically defined __class__ pointer, but on the other hand most
    keys in inst.__dict__ override attributes defined in inst.__class__.
    Presence of a __set__ method on a descriptor indicates that this is
    a data descriptor. (Even read-only data descriptors have a __set__
    method: it always raises an exception.) Absence of a __set__ method
    on a descriptor indicates that the descriptor isn't interested in
    intercepting assignment, and then the classic rule applies: an
    instance variable with the same name as a method hides the method
    until it is deleted.

6.  Attribute descriptors

    This is where it gets interesting -- and messy. Attribute
    descriptors (descriptors for short) are stored in the meta-object's
    __dict__ (or in the __dict__ of one of its ancestors), and have two
    uses: a descriptor can be used to get or set the corresponding
    attribute value on the (regular, non-meta) object, and it has an
    additional interface that describes the attribute for documentation
    and introspection purposes.

    There is little prior art in Python for designing the descriptor's
    interface, neither for getting/setting the value nor for describing
    the attribute otherwise, except some trivial properties (it's
    reasonable to assume that __name__ and __doc__ should be the
    attribute's name and docstring). I will propose such an API below.

    If an object found in the meta-object's __dict__ is not an attribute
    descriptor, backward compatibility dictates certain minimal
    semantics. This basically means that if it is a Python function or
    an unbound method, the attribute is a method; otherwise, it is the
    default value for a dynamic data attribute. Backwards compatibility
    also dictates that (in the absence of a __setattr__ method) it is
    legal to assign to an attribute corresponding to a method, and that
    this creates a data attribute shadowing the method for this
    particular instance. However, these semantics are only required for
    backwards compatibility with regular classes.

The introspection API is a read-only API. We don't define the effect of
assignment to any of the special attributes (__dict__, __class__ and
__bases__), nor the effect of assignment to the items of a __dict__.
Generally, such assignments should be considered off-limits. A future
PEP may define some semantics for some such assignments. (Especially
because currently instances support assignment to __class__ and
__dict__, and classes support assignment to __bases__ and __dict__.)

Specification of the attribute descriptor API

Attribute descriptors may have the following attributes. In the
examples, x is an object, C is x.__class__, x.meth() is a method, and
x.ivar is a data attribute or instance variable. All attributes are
optional -- a specific attribute may or may not be present on a given
descriptor. An absent attribute means that the corresponding information
is not available or the corresponding functionality is not implemented.

-   __name__: the attribute name. Because of aliasing and renaming, the
    attribute may (additionally or exclusively) be known under a
    different name, but this is the name under which it was born.
    Example: C.meth.__name__ == 'meth'.
-   __doc__: the attribute's documentation string. This may be None.
-   __objclass__: the class that declared this attribute. The descriptor
    only applies to objects that are instances of this class (this
    includes instances of its subclasses). Example:
    C.meth.__objclass__ is C.
-   __get__(): a function callable with one or two arguments that
    retrieves the attribute value from an object. This is also referred
    to as a "binding" operation, because it may return a "bound method"
    object in the case of method descriptors. The first argument, X, is
    the object from which the attribute must be retrieved or to which it
    must be bound. When X is None, the optional second argument, T,
    should be meta-object and the binding operation may return an
    unbound method restricted to instances of T. When both X and T are
    specified, X should be an instance of T. Exactly what is returned by
    the binding operation depends on the semantics of the descriptor;
    for example, static methods and class methods (see below) ignore the
    instance and bind to the type instead.
-   __set__(): a function of two arguments that sets the attribute value
    on the object. If the attribute is read-only, this method may raise
    a TypeError or AttributeError exception (both are allowed, because
    both are historically found for undefined or unsettable attributes).
    Example: C.ivar.set(x, y) ~~ x.ivar = y.

Static methods and class methods

The descriptor API makes it possible to add static methods and class
methods. Static methods are easy to describe: they behave pretty much
like static methods in C++ or Java. Here's an example:

    class C:

        def foo(x, y):
            print "staticmethod", x, y
        foo = staticmethod(foo)

    C.foo(1, 2)
    c = C()
    c.foo(1, 2)

Both the call C.foo(1, 2) and the call c.foo(1, 2) call foo() with two
arguments, and print "staticmethod 1 2". No "self" is declared in the
definition of foo(), and no instance is required in the call.

The line "foo = staticmethod(foo)" in the class statement is the crucial
element: this makes foo() a static method. The built-in staticmethod()
wraps its function argument in a special kind of descriptor whose
__get__() method returns the original function unchanged. Without this,
the __get__() method of standard function objects would have created a
bound method object for 'c.foo' and an unbound method object for
'C.foo'.

(XXX Barry suggests to use "sharedmethod" instead of "staticmethod",
because the word static is being overloaded in so many ways already. But
I'm not sure if shared conveys the right meaning.)

Class methods use a similar pattern to declare methods that receive an
implicit first argument that is the class for which they are invoked.
This has no C++ or Java equivalent, and is not quite the same as what
class methods are in Smalltalk, but may serve a similar purpose.
According to Armin Rigo, they are similar to "virtual class methods" in
Borland Pascal dialect Delphi. (Python also has real metaclasses, and
perhaps methods defined in a metaclass have more right to the name
"class method"; but I expect that most programmers won't be using
metaclasses.) Here's an example:

    class C:

        def foo(cls, y):
            print "classmethod", cls, y
        foo = classmethod(foo)

    C.foo(1)
    c = C()
    c.foo(1)

Both the call C.foo(1) and the call c.foo(1) end up calling foo() with
two arguments, and print "classmethod __main__.C 1". The first argument
of foo() is implied, and it is the class, even if the method was invoked
via an instance. Now let's continue the example:

    class D(C):
        pass

    D.foo(1)
    d = D()
    d.foo(1)

This prints "classmethod __main__.D 1" both times; in other words, the
class passed as the first argument of foo() is the class involved in the
call, not the class involved in the definition of foo().

But notice this:

    class E(C):
        def foo(cls, y): # override C.foo
            print "E.foo() called"
            C.foo(y)
        foo = classmethod(foo)

    E.foo(1)
    e = E()
    e.foo(1)

In this example, the call to C.foo() from E.foo() will see class C as
its first argument, not class E. This is to be expected, since the call
specifies the class C. But it stresses the difference between these
class methods and methods defined in metaclasses, where an upcall to a
metamethod would pass the target class as an explicit first argument.
(If you don't understand this, don't worry, you're not alone.) Note that
calling cls.foo(y) would be a mistake -- it would cause infinite
recursion. Also note that you can't specify an explicit 'cls' argument
to a class method. If you want this (e.g. the __new__ method in PEP 253
requires this), use a static method with a class as its explicit first
argument instead.

C API

XXX The following is VERY rough text that I wrote with a different
audience in mind; I'll have to go through this to edit it more. XXX It
also doesn't go into enough detail for the C API.

A built-in type can declare special data attributes in two ways: using a
struct memberlist (defined in structmember.h) or a struct getsetlist
(defined in descrobject.h). The struct memberlist is an old mechanism
put to new use: each attribute has a descriptor record including its
name, an enum giving its type (various C types are supported as well as
PyObject *), an offset from the start of the instance, and a read-only
flag.

The struct getsetlist mechanism is new, and intended for cases that
don't fit in that mold, because they either require additional checking,
or are plain calculated attributes. Each attribute here has a name, a
getter C function pointer, a setter C function pointer, and a context
pointer. The function pointers are optional, so that for example setting
the setter function pointer to NULL makes a read-only attribute. The
context pointer is intended to pass auxiliary information to generic
getter/setter functions, but I haven't found a need for this yet.

Note that there is also a similar mechanism to declare built-in methods:
these are PyMethodDef structures, which contain a name and a C function
pointer (and some flags for the calling convention).

Traditionally, built-in types have had to define their own tp_getattro
and tp_setattro slot functions to make these attribute definitions work
(PyMethodDef and struct memberlist are quite old). There are convenience
functions that take an array of PyMethodDef or memberlist structures, an
object, and an attribute name, and return or set the attribute if found
in the list, or raise an exception if not found. But these convenience
functions had to be explicitly called by the tp_getattro or tp_setattro
method of the specific type, and they did a linear search of the array
using strcmp() to find the array element describing the requested
attribute.

I now have a brand spanking new generic mechanism that improves this
situation substantially.

-   Pointers to arrays of PyMethodDef, memberlist, getsetlist structures
    are part of the new type object (tp_methods, tp_members, tp_getset).

-   At type initialization time (in PyType_InitDict()), for each entry
    in those three arrays, a descriptor object is created and placed in
    a dictionary that belongs to the type (tp_dict).

-   Descriptors are very lean objects that mostly point to the
    corresponding structure. An implementation detail is that all
    descriptors share the same object type, and a discriminator field
    tells what kind of descriptor it is (method, member, or getset).

-   As explained in PEP 252, descriptors have a get() method that takes
    an object argument and returns that object's attribute; descriptors
    for writable attributes also have a set() method that takes an
    object and a value and set that object's attribute. Note that the
    get() object also serves as a bind() operation for methods, binding
    the unbound method implementation to the object.

-   Instead of providing their own tp_getattro and tp_setattro
    implementation, almost all built-in objects now place
    PyObject_GenericGetAttr and (if they have any writable attributes)
    PyObject_GenericSetAttr in their tp_getattro and tp_setattro slots.
    (Or, they can leave these NULL, and inherit them from the default
    base object, if they arrange for an explicit call to
    PyType_InitDict() for the type before the first instance is
    created.)

-   In the simplest case, PyObject_GenericGetAttr() does exactly one
    dictionary lookup: it looks up the attribute name in the type's
    dictionary (obj->ob_type->tp_dict). Upon success, there are two
    possibilities: the descriptor has a get method, or it doesn't. For
    speed, the get and set methods are type slots: tp_descr_get and
    tp_descr_set. If the tp_descr_get slot is non-NULL, it is called,
    passing the object as its only argument, and the return value from
    this call is the result of the getattr operation. If the
    tp_descr_get slot is NULL, as a fallback the descriptor itself is
    returned (compare class attributes that are not methods but simple
    values).

-   PyObject_GenericSetAttr() works very similar but uses the
    tp_descr_set slot and calls it with the object and the new attribute
    value; if the tp_descr_set slot is NULL, an AttributeError is
    raised.

-   But now for a more complicated case. The approach described above is
    suitable for most built-in objects such as lists, strings, numbers.
    However, some object types have a dictionary in each instance that
    can store arbitrary attributes. In fact, when you use a class
    statement to subtype an existing built-in type, you automatically
    get such a dictionary (unless you explicitly turn it off, using
    another advanced feature, __slots__). Let's call this the instance
    dict, to distinguish it from the type dict.

-   In the more complicated case, there's a conflict between names
    stored in the instance dict and names stored in the type dict. If
    both dicts have an entry with the same key, which one should we
    return? Looking at classic Python for guidance, I find conflicting
    rules: for class instances, the instance dict overrides the class
    dict, except for the special attributes (like __dict__ and
    __class__), which have priority over the instance dict.

-   I resolved this with the following set of rules, implemented in
    PyObject_GenericGetAttr():

    1.  Look in the type dict. If you find a data descriptor, use its
        get() method to produce the result. This takes care of special
        attributes like __dict__ and __class__.
    2.  Look in the instance dict. If you find anything, that's it.
        (This takes care of the requirement that normally the instance
        dict overrides the class dict.)
    3.  Look in the type dict again (in reality this uses the saved
        result from step 1, of course). If you find a descriptor, use
        its get() method; if you find something else, that's it; if it's
        not there, raise AttributeError.

    This requires a classification of descriptors as data and nondata
    descriptors. The current implementation quite sensibly classifies
    member and getset descriptors as data (even if they are read-only!)
    and method descriptors as nondata. Non-descriptors (like function
    pointers or plain values) are also classified as non-data (!).

-   This scheme has one drawback: in what I assume to be the most common
    case, referencing an instance variable stored in the instance dict,
    it does two dictionary lookups, whereas the classic scheme did a
    quick test for attributes starting with two underscores plus a
    single dictionary lookup. (Although the implementation is sadly
    structured as instance_getattr() calling instance_getattr1() calling
    instance_getattr2() which finally calls PyDict_GetItem(), and the
    underscore test calls PyString_AsString() rather than inlining this.
    I wonder if optimizing the snot out of this might not be a good idea
    to speed up Python 2.2, if we weren't going to rip it all out. :-)

-   A benchmark verifies that in fact this is as fast as classic
    instance variable lookup, so I'm no longer worried.

-   Modification for dynamic types: step 1 and 3 look in the dictionary
    of the type and all its base classes (in MRO sequence, or course).

Discussion

XXX

Examples

Let's look at lists. In classic Python, the method names of lists were
available as the __methods__ attribute of list objects:

    >>> [].__methods__
    ['append', 'count', 'extend', 'index', 'insert', 'pop',
    'remove', 'reverse', 'sort']
    >>>

Under the new proposal, the __methods__ attribute no longer exists:

    >>> [].__methods__
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    AttributeError: 'list' object has no attribute '__methods__'
    >>>

Instead, you can get the same information from the list type:

    >>> T = [].__class__
    >>> T
    <type 'list'>
    >>> dir(T)                # like T.__dict__.keys(), but sorted
    ['__add__', '__class__', '__contains__', '__eq__', '__ge__',
    '__getattr__', '__getitem__', '__getslice__', '__gt__',
    '__iadd__', '__imul__', '__init__', '__le__', '__len__',
    '__lt__', '__mul__', '__ne__', '__new__', '__radd__',
    '__repr__', '__rmul__', '__setitem__', '__setslice__', 'append',
    'count', 'extend', 'index', 'insert', 'pop', 'remove',
    'reverse', 'sort']
    >>>

The new introspection API gives more information than the old one: in
addition to the regular methods, it also shows the methods that are
normally invoked through special notations, e.g. __iadd__ (+=), __len__
(len), __ne__ (!=). You can invoke any method from this list directly:

    >>> a = ['tic', 'tac']
    >>> T.__len__(a)          # same as len(a)
    2
    >>> T.append(a, 'toe')    # same as a.append('toe')
    >>> a
    ['tic', 'tac', 'toe']
    >>>

This is just like it is for user-defined classes.

Notice a familiar yet surprising name in the list: __init__. This is the
domain of PEP 253.

Backwards compatibility

XXX

Warnings and Errors

XXX

Implementation

A partial implementation of this PEP is available from CVS as a branch
named "descr-branch". To experiment with this implementation, proceed to
check out Python from CVS according to the instructions at
http://sourceforge.net/cvs/?group_id=5470 but add the arguments "-r
descr-branch" to the cvs checkout command. (You can also start with an
existing checkout and do "cvs update -r descr-branch".) For some
examples of the features described here, see the file
Lib/test/test_descr.py.

Note: the code in this branch goes way beyond this PEP; it is also the
experimentation area for PEP 253 (Subtyping Built-in Types).

References

XXX

Copyright

This document has been placed in the public domain.