PEP: 520 Title: Preserving Class Attribute Definition Order Version:
$Revision$ Last-Modified: $Date$ Author: Eric Snow
<ericsnowcurrently@gmail.com> Status: Final Type: Standards Track
Content-Type: text/x-rst Created: 07-Jun-2016 Python-Version: 3.6
Post-History: 07-Jun-2016, 11-Jun-2016, 20-Jun-2016, 24-Jun-2016
Resolution:
https://mail.python.org/pipermail/python-dev/2016-June/145442.html

Note

Since compact dict has landed in 3.6, __definition_order__ has been
removed. cls.__dict__ now mostly accomplishes the same thing instead.

Abstract

The class definition syntax is ordered by its very nature. Class
attributes defined there are thus ordered. Aside from helping with
readability, that ordering is sometimes significant. If it were
automatically available outside the class definition then the attribute
order could be used without the need for extra boilerplate (such as
metaclasses or manually enumerating the attribute order). Given that
this information already exists, access to the definition order of
attributes is a reasonable expectation. However, currently Python does
not preserve the attribute order from the class definition.

This PEP changes that by preserving the order in which attributes are
introduced in the class definition body. That order will now be
preserved in the __definition_order__ attribute of the class. This
allows introspection of the original definition order, e.g. by class
decorators.

Additionally, this PEP requires that the default class definition
namespace be ordered (e.g. OrderedDict) by default. The long-lived class
namespace (__dict__) will remain a dict.

Motivation

The attribute order from a class definition may be useful to tools that
rely on name order. However, without the automatic availability of the
definition order, those tools must impose extra requirements on users.
For example, use of such a tool may require that your class use a
particular metaclass. Such requirements are often enough to discourage
use of the tool.

Some tools that could make use of this PEP include:

-   documentation generators
-   testing frameworks
-   CLI frameworks
-   web frameworks
-   config generators
-   data serializers
-   enum factories (my original motivation)

Background

When a class is defined using a class statement, the class body is
executed within a namespace. Currently that namespace defaults to dict.
If the metaclass defines __prepare__() then the result of calling it is
used for the class definition namespace.

After the execution completes, the definition namespace is copied into a
new dict. Then the original definition namespace is discarded. The new
copy is stored away as the class's namespace and is exposed as __dict__
through a read-only proxy.

The class attribute definition order is represented by the insertion
order of names in the definition namespace. Thus, we can have access to
the definition order by switching the definition namespace to an ordered
mapping, such as collections.OrderedDict. This is feasible using a
metaclass and __prepare__, as described above. In fact, exactly this is
by far the most common use case for using __prepare__.

At that point, the only missing thing for later access to the definition
order is storing it on the class before the definition namespace is
thrown away. Again, this may be done using a metaclass. However, this
means that the definition order is preserved only for classes that use
such a metaclass. There are two practical problems with that:

First, it requires the use of a metaclass. Metaclasses introduce an
extra level of complexity to code and in some cases (e.g. conflicts) are
a problem. So reducing the need for them is worth doing when the
opportunity presents itself. PEP 422 and PEP 487 discuss this at length.
We have such an opportunity by using an ordered mapping (e.g.
OrderedDict for CPython at least) for the default class definition
namespace, virtually eliminating the need for __prepare__().

Second, only classes that opt in to using the OrderedDict-based
metaclass will have access to the definition order. This is problematic
for cases where universal access to the definition order is important.

Specification

Part 1:

-   all classes have a __definition_order__ attribute
-   __definition_order__ is a tuple of identifiers (or None)
-   __definition_order__ is always set:
    1.  during execution of the class body, the insertion order of names
        into the class definition namespace is stored in a tuple
    2.  if __definition_order__ is defined in the class body then it
        must be a tuple of identifiers or None; any other value will
        result in TypeError
    3.  classes that do not have a class definition (e.g. builtins) have
        their __definition_order__ set to None
    4.  classes for which __prepare__() returned something other than
        OrderedDict (or a subclass) have their __definition_order__ set
        to None (except where #2 applies)

Not changing:

-   dir() will not depend on __definition_order__
-   descriptors and custom __getattribute__ methods are unconstrained
    regarding __definition_order__

Part 2:

-   the default class definition namespace is now an ordered mapping
    (e.g. OrderdDict)
-   cls.__dict__ does not change, remaining a read-only proxy around
    dict

Note that Python implementations which have an ordered dict won't need
to change anything.

The following code demonstrates roughly equivalent semantics for both
parts 1 and 2:

    class Meta(type):
        @classmethod
        def __prepare__(cls, *args, **kwargs):
            return OrderedDict()

    class Spam(metaclass=Meta):
        ham = None
        eggs = 5
        __definition_order__ = tuple(locals())

Why a tuple?

Use of a tuple reflects the fact that we are exposing the order in which
attributes on the class were defined. Since the definition is already
complete by the time __definition_order__ is set, the content and order
of the value won't be changing. Thus we use a type that communicates
that state of immutability.

Why not a read-only attribute?

There are some valid arguments for making __definition_order__ a
read-only attribute (like cls.__dict__ is). Most notably, a read-only
attribute conveys the nature of the attribute as "complete", which is
exactly correct for __definition_order__. Since it represents the state
of a particular one-time event (execution of the class definition body),
allowing the value to be replaced would reduce confidence that the
attribute corresponds to the original class body. Furthermore, often an
immutable-by-default approach helps to make data easier to reason about.

However, in this case there still isn't a strong reason to counter the
well-worn precedent found in Python. Per Guido:

    I don't see why it needs to be a read-only attribute. There are
    very few of those -- in general we let users play around with
    things unless we have a hard reason to restrict assignment (e.g.
    the interpreter's internal state could be compromised). I don't
    see such a hard reason here.

Also, note that a writeable __definition_order__ allows dynamically
created classes (e.g. by Cython) to still have __definition_order__
properly set. That could certainly be handled through specific
class-creation tools, such as type() or the C-API, without the need to
lose the semantics of a read-only attribute. However, with a writeable
attribute it's a moot point.

Why not "__attribute_order__"?

__definition_order__ is centered on the class definition body. The use
cases for dealing with the class namespace (__dict__) post-definition
are a separate matter. __definition_order__ would be a significantly
misleading name for a feature focused on more than class definition.

Why not ignore "dunder" names?

Names starting and ending with "__" are reserved for use by the
interpreter. In practice they should not be relevant to the users of
__definition_order__. Instead, for nearly everyone they would only be
clutter, causing the same extra work (filtering out the dunder names)
for the majority. In cases where a dunder name is significant, the class
definition could manually set __definition_order__, making the common
case simpler.

However, leaving dunder names out of __definition_order__ means that
their place in the definition order would be unrecoverably lost.
Dropping dunder names by default may inadvertently cause problems for
classes that use dunder names unconventionally. In this case it's better
to play it safe and preserve all the names from the class definition.
This isn't a big problem since it is easy to filter out dunder names:

    (name for name in cls.__definition_order__
          if not (name.startswith('__') and name.endswith('__')))

In fact, in some application contexts there may be other criteria on
which similar filtering would be applied, such as ignoring any name
starting with "_", leaving out all methods, or including only
descriptors. Ultimately dunder names aren't a special enough case to be
treated exceptionally.

Note that a couple of dunder names (__name__ and __qualname__) are
injected by default by the compiler. So they will be included even
though they are not strictly part of the class definition body.

Why None instead of an empty tuple?

A key objective of adding __definition_order__ is to preserve
information in class definitions which was lost prior to this PEP. One
consequence is that __definition_order__ implies an original class
definition. Using None allows us to clearly distinguish classes that do
not have a definition order. An empty tuple clearly indicates a class
that came from a definition statement but did not define any attributes
there.

Why None instead of not setting the attribute?

The absence of an attribute requires more complex handling than None
does for consumers of __definition_order__.

Why constrain manually set values?

If __definition_order__ is manually set in the class body then it will
be used. We require it to be a tuple of identifiers (or None) so that
consumers of __definition_order__ may have a consistent expectation for
the value. That helps maximize the feature's usefulness.

We could also allow an arbitrary iterable for a manually set
__definition_order__ and convert it into a tuple. However, not all
iterables infer a definition order (e.g. set). So we opt in favor of
requiring a tuple.

Why not hide __definition_order__ on non-type objects?

Python doesn't make much effort to hide class-specific attributes during
lookup on instances of classes. While it may make sense to consider
__definition_order__ a class-only attribute, hidden during lookup on
objects, setting precedent in that regard is beyond the goals of this
PEP.

What about __slots__?

__slots__ will be added to __definition_order__ like any other name in
the class definition body. The actual slot names will not be added to
__definition_order__ since they aren't set as names in the definition
namespace.

Why is __definition_order__ even necessary?

Since the definition order is not preserved in __dict__, it is lost once
class definition execution completes. Classes could explicitly set the
attribute as the last thing in the body. However, then independent
decorators could only make use of classes that had done so. Instead,
__definition_order__ preserves this one bit of info from the class body
so that it is universally available.

Support for C-API Types

Arguably, most C-defined Python types (e.g. built-in, extension modules)
have a roughly equivalent concept of a definition order. So conceivably
__definition_order__ could be set for such types automatically. This PEP
does not introduce any such support. However, it does not prohibit it
either. However, since __definition_order__ can be set at any time
through normal attribute assignment, it does not need any special
treatment in the C-API.

The specific cases:

-   builtin types
-   PyType_Ready
-   PyType_FromSpec

Compatibility

This PEP does not break backward compatibility, except in the case that
someone relies strictly on dict as the class definition namespace. This
shouldn't be a problem since issubclass(OrderedDict, dict) is true.

Changes

In addition to the class syntax, the following expose the new behavior:

-   builtins.__build_class__
-   types.prepare_class
-   types.new_class

Also, the 3-argument form of builtins.type() will allow inclusion of
__definition_order__ in the namespace that gets passed in. It will be
subject to the same constraints as when __definition_order__ is
explicitly defined in the class body.

Other Python Implementations

Pending feedback, the impact on Python implementations is expected to be
minimal. All conforming implementations are expected to set
__definition_order__ as described in this PEP.

Implementation

The implementation is found in the tracker.

Alternatives

An Order-preserving cls.__dict__

Instead of storing the definition order in __definition_order__, the
now-ordered definition namespace could be copied into a new OrderedDict.
This would then be used as the mapping proxied as __dict__. Doing so
would mostly provide the same semantics.

However, using OrderedDict for __dict__ would obscure the relationship
with the definition namespace, making it less useful.

Additionally, (in the case of OrderedDict specifically) doing this would
require significant changes to the semantics of the concrete dict C-API.

There has been some discussion about moving to a compact dict
implementation which would (mostly) preserve insertion order. However
the lack of an explicit __definition_order__ would still remain as a
pain point.

A "namespace" Keyword Arg for Class Definition

PEP 422 <422#order-preserving-classes> introduced a new "namespace"
keyword arg to class definitions that effectively replaces the need to
__prepare__(). However, the proposal was withdrawn in favor of the
simpler PEP 487.

A stdlib Metaclass that Implements __prepare__() with OrderedDict

This has all the same problems as writing your own metaclass. The only
advantage is that you don't have to actually write this metaclass. So it
doesn't offer any benefit in the context of this PEP.

Set __definition_order__ at Compile-time

Each class's __qualname__ is determined at compile-time. This same
concept could be applied to __definition_order__. The result of
composing __definition_order__ at compile-time would be nearly the same
as doing so at run-time.

Comparative implementation difficulty aside, the key difference would be
that at compile-time it would not be practical to preserve definition
order for attributes that are set dynamically in the class body (e.g.
locals()[name] = value). However, they should still be reflected in the
definition order. One possible resolution would be to require class
authors to manually set __definition_order__ if they define any class
attributes dynamically.

Ultimately, the use of OrderedDict at run-time or compile-time discovery
is almost entirely an implementation detail.

References

-   Original discussion
-   Follow-up 1
-   Follow-up 2
-   Alyssa (Nick) Coghlan's concerns about mutability

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End: