PEP: 372 Title: Adding an ordered dictionary to collections Author:
Armin Ronacher <armin.ronacher@active-4.com>, Raymond Hettinger
<python@rcn.com> Status: Final Type: Standards Track Content-Type:
text/x-rst Created: 15-Jun-2008 Python-Version: 2.7, 3.1 Post-History:

Abstract

This PEP proposes an ordered dictionary as a new data structure for the
collections module, called "OrderedDict" in this PEP. The proposed API
incorporates the experiences gained from working with similar
implementations that exist in various real-world applications and other
programming languages.

Patch

A working Py3.1 patch including tests and documentation is at:

  OrderedDict patch

The check-in was in revisions: 70101 and 70102

Rationale

In current Python versions, the widely used built-in dict type does not
specify an order for the key/value pairs stored. This makes it hard to
use dictionaries as data storage for some specific use cases.

Some dynamic programming languages like PHP and Ruby 1.9 guarantee a
certain order on iteration. In those languages, and existing Python
ordered-dict implementations, the ordering of items is defined by the
time of insertion of the key. New keys are appended at the end, but keys
that are overwritten are not moved to the end.

The following example shows the behavior for simple assignments:

  >>> d = OrderedDict() >>> d['parrot'] = 'dead' >>> d['penguin'] =
  'exploded' >>> d.items() [('parrot', 'dead'), ('penguin', 'exploded')]

That the ordering is preserved makes an OrderedDict useful for a couple
of situations:

-   XML/HTML processing libraries currently drop the ordering of
    attributes, use a list instead of a dict which makes filtering
    cumbersome, or implement their own ordered dictionary. This affects
    ElementTree, html5lib, Genshi and many more libraries.

-   There are many ordered dict implementations in various libraries and
    applications, most of them subtly incompatible with each other.
    Furthermore, subclassing dict is a non-trivial task and many
    implementations don't override all the methods properly which can
    lead to unexpected results.

    Additionally, many ordered dicts are implemented in an inefficient
    way, making many operations more complex then they have to be.

-   PEP 3115 allows metaclasses to change the mapping object used for
    the class body. An ordered dict could be used to create ordered
    member declarations similar to C structs. This could be useful, for
    example, for future ctypes releases as well as ORMs that define
    database tables as classes, like the one the Django framework ships.
    Django currently uses an ugly hack to restore the ordering of
    members in database models.

-   The RawConfigParser class accepts a dict_type argument that allows
    an application to set the type of dictionary used internally. The
    motivation for this addition was expressly to allow users to provide
    an ordered dictionary.[1]

-   Code ported from other programming languages such as PHP often
    depends on an ordered dict. Having an implementation of an
    ordering-preserving dictionary in the standard library could ease
    the transition and improve the compatibility of different libraries.

Ordered Dict API

The ordered dict API would be mostly compatible with dict and existing
ordered dicts. Note: this PEP refers to the 2.7 and 3.0 dictionary API
as described in collections.Mapping abstract base class.

The constructor and update() both accept iterables of tuples as well as
mappings like a dict does. Unlike a regular dictionary, the insertion
order is preserved.

  >>> d = OrderedDict([('a', 'b'), ('c', 'd')]) >>> d.update({'foo':
  'bar'}) >>> d collections.OrderedDict([('a', 'b'), ('c', 'd'), ('foo',
  'bar')])

If ordered dicts are updated from regular dicts, the ordering of new
keys is of course undefined.

All iteration methods as well as keys(), values() and items() return the
values ordered by the time the key was first inserted:

  >>> d['spam'] = 'eggs' >>> d.keys() ['a', 'c', 'foo', 'spam'] >>>
  d.values() ['b', 'd', 'bar', 'eggs'] >>> d.items() [('a', 'b'), ('c',
  'd'), ('foo', 'bar'), ('spam', 'eggs')]

New methods not available on dict:

OrderedDict.__reversed__()

    Supports reverse iteration by key.

Questions and Answers

What happens if an existing key is reassigned?

  The key is not moved but assigned a new value in place. This is
  consistent with existing implementations.

What happens if keys appear multiple times in the list passed to the
constructor?

  The same as for regular dicts -- the latter item overrides the former.
  This has the side-effect that the position of the first key is used
  because only the value is actually overwritten:

      >>> OrderedDict([('a', 1), ('b', 2), ('a', 3)])
      collections.OrderedDict([('a', 3), ('b', 2)])

  This behavior is consistent with existing implementations in Python,
  the PHP array and the hashmap in Ruby 1.9.

Is the ordered dict a dict subclass? Why?

  Yes. Like defaultdict, an ordered dictionary subclasses dict. Being a
  dict subclass make some of the methods faster (like __getitem__ and
  __len__). More importantly, being a dict subclass lets ordered
  dictionaries be usable with tools like json that insist on having dict
  inputs by testing isinstance(d, dict).

Do any limitations arise from subclassing dict?

  Yes. Since the API for dicts is different in Py2.x and Py3.x, the
  OrderedDict API must also be different. So, the Py2.7 version will
  need to override iterkeys, itervalues, and iteritems.

Does OrderedDict.popitem() return a particular key/value pair?

  Yes. It pops-off the most recently inserted new key and its
  corresponding value. This corresponds to the usual LIFO behavior
  exhibited by traditional push/pop pairs. It is semantically equivalent
  to k=list(od)[-1]; v=od[k]; del od[k]; return (k,v). The actual
  implementation is more efficient and pops directly from a sorted list
  of keys.

Does OrderedDict support indexing, slicing, and whatnot?

  As a matter of fact, OrderedDict does not implement the Sequence
  interface. Rather, it is a MutableMapping that remembers the order of
  key insertion. The only sequence-like addition is support for
  reversed.

  A further advantage of not allowing indexing is that it leaves open
  the possibility of a fast C implementation using linked lists.

Does OrderedDict support alternate sort orders such as alphabetical?

  No. Those wanting different sort orders really need to be using
  another technique. The OrderedDict is all about recording insertion
  order. If any other order is of interest, then another structure (like
  an in-memory dbm) is likely a better fit.

How well does OrderedDict work with the json module, PyYAML, and
ConfigParser?

  For json, the good news is that json's encoder respects OrderedDict's
  iteration order:

      >>> items = [('one', 1), ('two', 2), ('three',3), ('four',4), ('five',5)]
      >>> json.dumps(OrderedDict(items))
      '{"one": 1, "two": 2, "three": 3, "four": 4, "five": 5}'

  In Py2.6, the object_hook for json decoders passes-in an already built
  dictionary so order is lost before the object hook sees it. This
  problem is being fixed for Python 2.7/3.1 by adding a new hook that
  preserves order (see https://github.com/python/cpython/issues/49631 ).
  With the new hook, order can be preserved:

      >>> jtext = '{"one": 1, "two": 2, "three": 3, "four": 4, "five": 5}'
      >>> json.loads(jtext, object_pairs_hook=OrderedDict)
      OrderedDict({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5})

  For PyYAML, a full round-trip is problem free:

      >>> ytext = yaml.dump(OrderedDict(items))
      >>> print ytext
      !!python/object/apply:collections.OrderedDict
      - - [one, 1]
        - [two, 2]
        - [three, 3]
        - [four, 4]
        - [five, 5]

      >>> yaml.load(ytext)
      OrderedDict({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5})

  For the ConfigParser module, round-tripping is also problem free.
  Custom dicts were added in Py2.6 specifically to support ordered
  dictionaries:

      >>> config = ConfigParser(dict_type=OrderedDict)
      >>> config.read('myconfig.ini')
      >>> config.remove_option('Log', 'error')
      >>> config.write(open('myconfig.ini', 'w'))

How does OrderedDict handle equality testing?

  Comparing two ordered dictionaries implies that the test will be
  order-sensitive so that list (od1.items())==list(od2.items()).

  When ordered dicts are compared with other Mappings, their order
  insensitive comparison is used. This allows ordered dictionaries to be
  substituted anywhere regular dictionaries are used.

How __repr__ format will maintain order during a repr/eval round-trip?

  OrderedDict([('a', 1), ('b', 2)])

What are the trade-offs of the possible underlying data structures?

  -   Keeping a sorted list of keys is fast for all operations except
      __delitem__() which becomes an O(n) exercise. This data structure
      leads to very simple code and little wasted space.
  -   Keeping a separate dictionary to record insertion sequence numbers
      makes the code a little bit more complex. All of the basic
      operations are O(1) but the constant factor is increased for
      __setitem__() and __delitem__() meaning that every use case will
      have to pay for this speedup (since all buildup go through
      __setitem__). Also, the first traversal incurs a one-time
      O(n log n) sorting cost. The storage costs are double that for the
      sorted-list-of-keys approach.
  -   A version written in C could use a linked list. The code would be
      more complex than the other two approaches but it would conserve
      space and would keep the same big-oh performance as regular
      dictionaries. It is the fastest and most space efficient.

Reference Implementation

An implementation with tests and documentation is at:

  OrderedDict patch

The proposed version has several merits:

-   Strict compliance with the MutableMapping API and no new methods so
    that the learning curve is near zero. It is simply a dictionary that
    remembers insertion order.
-   Generally good performance. The big-oh times are the same as regular
    dictionaries except that key deletion is O(n).

Other implementations of ordered dicts in various Python projects or
standalone libraries, that inspired the API proposed here, are:

-   odict in Python
-   odict in Babel
-   OrderedDict in Django
-   The odict module
-   ordereddict (a C implementation of the odict module)
-   StableDict
-   Armin Rigo's OrderedDict

Future Directions

With the availability of an ordered dict in the standard library, other
libraries may take advantage of that. For example, ElementTree could
return odicts in the future that retain the attribute ordering of the
source file.

References

Copyright

This document has been placed in the public domain.

[1] https://github.com/python/cpython/issues/42649