PEP: 469 Title: Migration of dict iteration code to Python 3 Version:
$Revision$ Last-Modified: $Date$ Author: Alyssa Coghlan
<ncoghlan@gmail.com> Status: Withdrawn Type: Standards Track
Content-Type: text/x-rst Created: 18-Apr-2014 Python-Version: 3.5
Post-History: 18-Apr-2014, 21-Apr-2014

Abstract

For Python 3, PEP 3106 changed the design of the dict builtin and the
mapping API in general to replace the separate list based and iterator
based APIs in Python 2 with a merged, memory efficient set and multiset
view based API. This new style of dict iteration was also added to the
Python 2.7 dict type as a new set of iteration methods.

This means that there are now 3 different kinds of dict iteration that
may need to be migrated to Python 3 when an application makes the
transition:

-   Lists as mutable snapshots: d.items() -> list(d.items())
-   Iterator objects: d.iteritems() -> iter(d.items())
-   Set based dynamic views: d.viewitems() -> d.items()

There is currently no widely agreed best practice on how to reliably
convert all Python 2 dict iteration code to the common subset of Python
2 and 3, especially when test coverage of the ported code is limited.
This PEP reviews the various ways the Python 2 iteration APIs may be
accessed, and looks at the available options for migrating that code to
Python 3 by way of the common subset of Python 2.6+ and Python 3.0+.

The PEP also considers the question of whether or not there are any
additions that may be worth making to Python 3.5 that may ease the
transition process for application code that doesn't need to worry about
supporting earlier versions when eventually making the leap to Python 3.

PEP Withdrawal

In writing the second draft of this PEP, I came to the conclusion that
the readability of hybrid Python 2/3 mapping code can actually be best
enhanced by better helper functions rather than by making changes to
Python 3.5+. The main value I now see in this PEP is as a clear record
of the recommended approaches to migrating mapping iteration code from
Python 2 to Python 3, as well as suggesting ways to keep things readable
and maintainable when writing hybrid code that supports both versions.

Notably, I recommend that hybrid code avoid calling mapping iteration
methods directly, and instead rely on builtin functions where possible,
and some additional helper functions for cases that would be a simple
combination of a builtin and a mapping method in pure Python 3 code, but
need to be handled slightly differently to get the exact same semantics
in Python 2.

Static code checkers like pylint could potentially be extended with an
optional warning regarding direct use of the mapping iteration methods
in a hybrid code base.

Mapping iteration models

Python 2.7 provides three different sets of methods to extract the keys,
values and items from a dict instance, accounting for 9 out of the 18
public methods of the dict type.

In Python 3, this has been rationalised to just 3 out of 11 public
methods (as the has_key method has also been removed).

Lists as mutable snapshots

This is the oldest of the three styles of dict iteration, and hence the
one implemented by the d.keys(), d.values() and d.items() methods in
Python 2.

These methods all return lists that are snapshots of the state of the
mapping at the time the method was called. This has a few consequences:

-   the original object can be mutated freely without affecting
    iteration over the snapshot
-   the snapshot can be modified independently of the original object
-   the snapshot consumes memory proportional to the size of the
    original mapping

The semantic equivalent of these operations in Python 3 are
list(d.keys()), list(d.values()) and list(d.iteritems()).

Iterator objects

In Python 2.2, dict objects gained support for the then-new iterator
protocol, allowing direct iteration over the keys stored in the
dictionary, thus avoiding the need to build a list just to iterate over
the dictionary contents one entry at a time. iter(d) provides direct
access to the iterator object for the keys.

Python 2 also provides a d.iterkeys() method that is essentially
synonymous with iter(d), along with d.itervalues() and d.iteritems()
methods.

These iterators provide live views of the underlying object, and hence
may fail if the set of keys in the underlying object is changed during
iteration:

    >>> d = dict(a=1)
    >>> for k in d:
    ...     del d[k]
    ...
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    RuntimeError: dictionary changed size during iteration

As iterators, iteration over these objects is also a one-time operation:
once the iterator is exhausted, you have to go back to the original
mapping in order to iterate again.

In Python 3, direct iteration over mappings works the same way as it
does in Python 2. There are no method based equivalents - the semantic
equivalents of d.itervalues() and d.iteritems() in Python 3 are
iter(d.values()) and iter(d.items()).

The six and future.utils compatibility modules also both provide
iterkeys(), itervalues() and iteritems() helper functions that provide
efficient iterator semantics in both Python 2 and 3.

Set based dynamic views

The model that is provided in Python 3 as a method based API is that of
set based dynamic views (technically multisets in the case of the
values() view).

In Python 3, the objects returned by d.keys(), d.values() and d. items()
provide a live view of the current state of the underlying object,
rather than taking a full snapshot of the current state as they did in
Python 2. This change is safe in many circumstances, but does mean that,
as with the direct iteration API, it is necessary to avoid adding or
removing keys during iteration, in order to avoid encountering the
following error:

    >>> d = dict(a=1)
    >>> for k, v in d.items():
    ...     del d[k]
    ...
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    RuntimeError: dictionary changed size during iteration

Unlike the iteration API, these objects are iterables, rather than
iterators: you can iterate over them multiple times, and each time they
will iterate over the entire underlying mapping.

These semantics are also available in Python 2.7 as the d.viewkeys(),
d.viewvalues() and d.viewitems() methods.

The future.utils compatibility module also provides viewkeys(),
viewvalues() and viewitems() helper functions when running on Python 2.7
or Python 3.x.

Migrating directly to Python 3

The 2to3 migration tool handles direct migrations to Python 3 in
accordance with the semantic equivalents described above:

-   d.keys() -> list(d.keys())
-   d.values() -> list(d.values())
-   d.items() -> list(d.items())
-   d.iterkeys() -> iter(d.keys())
-   d.itervalues() -> iter(d.values())
-   d.iteritems() -> iter(d.items())
-   d.viewkeys() -> d.keys()
-   d.viewvalues() -> d.values()
-   d.viewitems() -> d.items()

Rather than 9 distinct mapping methods for iteration, there are now only
the 3 view methods, which combine in straightforward ways with the two
relevant builtin functions to cover all of the behaviours that are
available as dict methods in Python 2.7.

Note that in many cases d.keys() can be replaced by just d, but the 2to3
migration tool doesn't attempt that replacement.

The 2to3 migration tool also does not provide any automatic assistance
for migrating references to these objects as bound or unbound methods -
it only automates conversions where the API is called immediately.

Migrating to the common subset of Python 2 and 3

When migrating to the common subset of Python 2 and 3, the above
transformations are not generally appropriate, as they all either result
in the creation of a redundant list in Python 2, have unexpectedly
different semantics in at least some cases, or both.

Since most code running in the common subset of Python 2 and 3 supports
at least as far back as Python 2.6, the currently recommended approach
to conversion of mapping iteration operation depends on two helper
functions for efficient iteration over mapping values and mapping item
tuples:

-   d.keys() -> list(d)
-   d.values() -> list(itervalues(d))
-   d.items() -> list(iteritems(d))
-   d.iterkeys() -> iter(d)
-   d.itervalues() -> itervalues(d)
-   d.iteritems() -> iteritems(d)

Both six and future.utils provide appropriate definitions of
itervalues() and iteritems() (along with essentially redundant
definitions of iterkeys()). Creating your own definitions of these
functions in a custom compatibility module is also relatively
straightforward:

    try:
        dict.iteritems
    except AttributeError:
        # Python 3
        def itervalues(d):
            return iter(d.values())
        def iteritems(d):
            return iter(d.items())
    else:
        # Python 2
        def itervalues(d):
            return d.itervalues()
        def iteritems(d):
            return d.iteritems()

The greatest loss of readability currently arises when converting code
that actually needs the list based snapshots that were the default in
Python 2. This readability loss could likely be mitigated by also
providing listvalues and listitems helper functions, allowing the
affected conversions to be simplified to:

-   d.values() -> listvalues(d)
-   d.items() -> listitems(d)

The corresponding compatibility function definitions are as
straightforward as their iterator counterparts:

    try:
        dict.iteritems
    except AttributeError:
        # Python 3
        def listvalues(d):
            return list(d.values())
        def listitems(d):
            return list(d.items())
    else:
        # Python 2
        def listvalues(d):
            return d.values()
        def listitems(d):
            return d.items()

With that expanded set of compatibility functions, Python 2 code would
then be converted to "idiomatic" hybrid 2/3 code as:

-   d.keys() -> list(d)
-   d.values() -> listvalues(d)
-   d.items() -> listitems(d)
-   d.iterkeys() -> iter(d)
-   d.itervalues() -> itervalues(d)
-   d.iteritems() -> iteritems(d)

This compares well for readability with the idiomatic pure Python 3 code
that uses the mapping methods and builtins directly:

-   d.keys() -> list(d)
-   d.values() -> list(d.values())
-   d.items() -> list(d.items())
-   d.iterkeys() -> iter(d)
-   d.itervalues() -> iter(d.values())
-   d.iteritems() -> iter(d.items())

It's also notable that when using this approach, hybrid code would never
invoke the mapping methods directly: it would always invoke either a
builtin or helper function instead, in order to ensure the exact same
semantics on both Python 2 and 3.

Migrating from Python 3 to the common subset with Python 2.7

While the majority of migrations are currently from Python 2 either
directly to Python 3 or to the common subset of Python 2 and Python 3,
there are also some migrations of newer projects that start in Python 3
and then later add Python 2 support, either due to user demand, or to
gain access to Python 2 libraries that are not yet available in Python 3
(and porting them to Python 3 or creating a Python 3 compatible
replacement is not a trivial exercise).

In these cases, Python 2.7 compatibility is often sufficient, and the
2.7+ only view based helper functions provided by future.utils allow the
bare accesses to the Python 3 mapping view methods to be replaced with
code that is compatible with both Python 2.7 and Python 3 (note, this is
the only migration chart in the PEP that has Python 3 code on the left
of the conversion):

-   d.keys() -> viewkeys(d)
-   d.values() -> viewvalues(d)
-   d.items() -> viewitems(d)
-   list(d.keys()) -> list(d)
-   list(d.values()) -> listvalues(d)
-   list(d.items()) -> listitems(d)
-   iter(d.keys()) -> iter(d)
-   iter(d.values()) -> itervalues(d)
-   iter(d.items()) -> iteritems(d)

As with migrations from Python 2 to the common subset, note that the
hybrid code ends up never invoking the mapping methods directly - it
only calls builtins and helper methods, with the latter addressing the
semantic differences between Python 2 and Python 3.

Possible changes to Python 3.5+

The main proposal put forward to potentially aid migration of existing
Python 2 code to Python 3 is the restoration of some or all of the
alternate iteration APIs to the Python 3 mapping API. In particular, the
initial draft of this PEP proposed making the following conversions
possible when migrating to the common subset of Python 2 and Python
3.5+:

-   d.keys() -> list(d)
-   d.values() -> list(d.itervalues())
-   d.items() -> list(d.iteritems())
-   d.iterkeys() -> d.iterkeys()
-   d.itervalues() -> d.itervalues()
-   d.iteritems() -> d.iteritems()

Possible mitigations of the additional language complexity in Python 3
created by restoring these methods included immediately deprecating
them, as well as potentially hiding them from the dir() function (or
perhaps even defining a way to make pydoc aware of function
deprecations).

However, in the case where the list output is actually desired, the end
result of that proposal is actually less readable than an appropriately
defined helper function, and the function and method forms of the
iterator versions are pretty much equivalent from a readability
perspective.

So unless I've missed something critical, readily available listvalues()
and listitems() helper functions look like they will improve the
readability of hybrid code more than anything we could add back to the
Python 3.5+ mapping API, and won't have any long-term impact on the
complexity of Python 3 itself.

Discussion

The fact that 5 years in to the Python 3 migration we still have users
considering the dict API changes a significant barrier to migration
suggests that there are problems with previously recommended approaches.
This PEP attempts to explore those issues and tries to isolate those
cases where previous advice (such as it was) could prove problematic.

My assessment (largely based on feedback from Twisted devs) is that
problems are most likely to arise when attempting to use d.keys(),
d.values(), and d.items() in hybrid code. While superficially it seems
as though there should be cases where it is safe to ignore the semantic
differences, in practice, the change from "mutable snapshot" to "dynamic
view" is significant enough that it is likely better to just force the
use of either list or iterator semantics for hybrid code, and leave the
use of the view semantics to pure Python 3 code.

This approach also creates rules that are simple enough and safe enough
that it should be possible to automate them in code modernisation
scripts that target the common subset of Python 2 and Python 3, just as
2to3 converts them automatically when targeting pure Python 3 code.

Acknowledgements

Thanks to the folks at the Twisted sprint table at PyCon for a very
vigorous discussion of this idea (and several other topics), and
especially to Hynek Schlawack for acting as a moderator when things got
a little too heated :)

Thanks also to JP Calderone and Itamar Turner-Trauring for their email
feedback, as well to the participants in the python-dev review of the
initial version of the PEP.

Copyright

This document has been placed in the public domain.