PEP: 290 Title: Code Migration and Modernization Version: $Revision$
Last-Modified: $Date$ Author: Raymond Hettinger <python@rcn.com> Status:
Active Type: Informational Content-Type: text/x-rst Created: 06-Jun-2002
Post-History:

Abstract

This PEP is a collection of procedures and ideas for updating Python
applications when newer versions of Python are installed.

The migration tips highlight possible areas of incompatibility and make
suggestions on how to find and resolve those differences. The
modernization procedures show how older code can be updated to take
advantage of new language features.

Rationale

This repository of procedures serves as a catalog or checklist of known
migration issues and procedures for addressing those issues.

Migration issues can arise for several reasons. Some obsolete features
are slowly deprecated according to the guidelines in PEP 4. Also, some
code relies on undocumented behaviors which are subject to change
between versions. Some code may rely on behavior which was subsequently
shown to be a bug and that behavior changes when the bug is fixed.

Modernization options arise when new versions of Python add features
that allow improved clarity or higher performance than previously
available.

Guidelines for New Entries

Developers with commit access may update this PEP directly. Others can
send their ideas to a developer for possible inclusion.

While a consistent format makes the repository easier to use, feel free
to add or subtract sections to improve clarity.

Grep patterns may be supplied as tool to help maintainers locate code
for possible updates. However, fully automated search/replace style
regular expressions are not recommended. Instead, each code fragment
should be evaluated individually.

The contra-indications section is the most important part of a new
entry. It lists known situations where the update SHOULD NOT be applied.

Migration Issues

Comparison Operators Not a Shortcut for Producing 0 or 1

Prior to Python 2.3, comparison operations returned 0 or 1 rather than
True or False. Some code may have used this as a shortcut for producing
zero or one in places where their boolean counterparts are not
appropriate. For example:

    def identity(m=1):
        """Create and m-by-m identity matrix"""
        return [[i==j for i in range(m)] for j in range(m)]

In Python 2.2, a call to identity(2) would produce:

    [[1, 0], [0, 1]]

In Python 2.3, the same call would produce:

    [[True, False], [False, True]]

Since booleans are a subclass of integers, the matrix would continue to
calculate normally, but it will not print as expected. The list
comprehension should be changed to read:

    return [[int(i==j) for i in range(m)] for j in range(m)]

There are similar concerns when storing data to be used by other
applications which may expect a number instead of True or False.

Modernization Procedures

Procedures are grouped by the Python version required to be able to take
advantage of the modernization.

Python 2.4 or Later

Inserting and Popping at the Beginning of Lists

Python's lists are implemented to perform best with appends and pops on
the right. Use of pop(0) or insert(0, x) triggers O(n) data movement for
the entire list. To help address this need, Python 2.4 introduces a new
container, collections.deque() which has efficient append and pop
operations on the both the left and right (the trade-off is much slower
getitem/setitem access). The new container is especially helpful for
implementing data queues:

Pattern:

    c = list(data)   -->   c = collections.deque(data)
    c.pop(0)         -->   c.popleft()
    c.insert(0, x)   -->   c.appendleft()

Locating:

    grep pop(0 or
    grep insert(0

Simplifying Custom Sorts

In Python 2.4, the sort method for lists and the new sorted built-in
function both accept a key function for computing sort keys. Unlike the
cmp function which gets applied to every comparison, the key function
gets applied only once to each record. It is much faster than cmp and
typically more readable while using less code. The key function also
maintains the stability of the sort (records with the same key are left
in their original order.

Original code using a comparison function:

    names.sort(lambda x,y: cmp(x.lower(), y.lower()))

Alternative original code with explicit decoration:

    tempnames = [(n.lower(), n) for n in names]
    tempnames.sort()
    names = [original for decorated, original in tempnames]

Revised code using a key function:

    names.sort(key=str.lower)       # case-insensitive sort

Locating: grep sort *.py

Replacing Common Uses of Lambda

In Python 2.4, the operator module gained two new functions,
itemgetter() and attrgetter() that can replace common uses of the lambda
keyword. The new functions run faster and are considered by some to
improve readability.

Pattern:

    lambda r: r[2]      -->  itemgetter(2)
    lambda r: r.myattr  -->  attrgetter('myattr')

Typical contexts:

    sort(studentrecords, key=attrgetter('gpa'))   # set a sort field
    map(attrgetter('lastname'), studentrecords)   # extract a field

Locating: grep lambda *.py

Simplified Reverse Iteration

Python 2.4 introduced the reversed builtin function for reverse
iteration. The existing approaches to reverse iteration suffered from
wordiness, performance issues (speed and memory consumption), and/or
lack of clarity. A preferred style is to express the sequence in a
forwards direction, apply reversed to the result, and then loop over the
resulting fast, memory friendly iterator.

Original code expressed with half-open intervals:

    for i in range(n-1, -1, -1):
        print seqn[i]

Alternative original code reversed in multiple steps:

    rseqn = list(seqn)
    rseqn.reverse()
    for value in rseqn:
        print value

Alternative original code expressed with extending slicing:

    for value in seqn[::-1]:
        print value

Revised code using the reversed function:

    for value in reversed(seqn):
        print value

Python 2.3 or Later

Testing String Membership

In Python 2.3, for string2 in string1, the length restriction on string2
is lifted; it can now be a string of any length. When searching for a
substring, where you don't care about the position of the substring in
the original string, using the in operator makes the meaning clear.

Pattern:

    string1.find(string2) >= 0   -->  string2 in string1
    string1.find(string2) != -1  -->  string2 in string1

Replace apply() with a Direct Function Call

In Python 2.3, apply() was marked for Pending Deprecation because it was
made obsolete by Python 1.6's introduction of * and ** in function
calls. Using a direct function call was always a little faster than
apply() because it saved the lookup for the builtin. Now, apply() is
even slower due to its use of the warnings module.

Pattern:

    apply(f, args, kwds)  -->  f(*args, **kwds)

Note: The Pending Deprecation was removed from apply() in Python 2.3.3
since it creates pain for people who need to maintain code that works
with Python versions as far back as 1.5.2, where there was no
alternative to apply(). The function remains deprecated, however.

Python 2.2 or Later

Testing Dictionary Membership

For testing dictionary membership, use the 'in' keyword instead of the
'has_key()' method. The result is shorter and more readable. The style
becomes consistent with tests for membership in lists. The result is
slightly faster because has_key requires an attribute search and uses a
relatively expensive function call.

Pattern:

    if d.has_key(k):  -->  if k in d:

Contra-indications:

1.  Some dictionary-like objects may not define a __contains__() method:

        if dictlike.has_key(k)

Locating: grep has_key

Looping Over Dictionaries

Use the new iter methods for looping over dictionaries. The iter methods
are faster because they do not have to create a new list object with a
complete copy of all of the keys, values, or items. Selecting only keys,
values, or items (key/value pairs) as needed saves the time for creating
throwaway object references and, in the case of items, saves a second
hash look-up of the key.

Pattern:

    for key in d.keys():      -->  for key in d:
    for value in d.values():  -->  for value in d.itervalues():
    for key, value in d.items():
                              -->  for key, value in d.iteritems():

Contra-indications:

1.  If you need a list, do not change the return type:

        def getids():  return d.keys()

2.  Some dictionary-like objects may not define iter methods:

        for k in dictlike.keys():

3.  Iterators do not support slicing, sorting or other operations:

        k = d.keys(); j = k[:]

4.  Dictionary iterators prohibit modifying the dictionary:

        for k in d.keys(): del[k]

stat Methods

Replace stat constants or indices with new os.stat attributes and
methods. The os.stat attributes and methods are not order-dependent and
do not require an import of the stat module.

Pattern:

    os.stat("foo")[stat.ST_MTIME]  -->  os.stat("foo").st_mtime
    os.stat("foo")[stat.ST_MTIME]  -->  os.path.getmtime("foo")

Locating: grep os.stat or grep stat.S

Reduce Dependency on types Module

The types module is likely to be deprecated in the future. Use built-in
constructor functions instead. They may be slightly faster.

Pattern:

    isinstance(v, types.IntType)      -->  isinstance(v, int)
    isinstance(s, types.StringTypes)  -->  isinstance(s, basestring)

Full use of this technique requires Python 2.3 or later (basestring was
introduced in Python 2.3), but Python 2.2 is sufficient for most uses.

Locating: grep types *.py | grep import

Avoid Variable Names that Clash with the __builtins__ Module

In Python 2.2, new built-in types were added for dict and file. Scripts
should avoid assigning variable names that mask those types. The same
advice also applies to existing builtins like list.

Pattern:

    file = open('myfile.txt') --> f = open('myfile.txt')
    dict = obj.__dict__ --> d = obj.__dict__

Locating: grep 'file ' *.py

Python 2.1 or Later

whrandom Module Deprecated

All random-related methods have been collected in one place, the random
module.

Pattern:

    import whrandom --> import random

Locating: grep whrandom

Python 2.0 or Later

String Methods

The string module is likely to be deprecated in the future. Use string
methods instead. They're faster too.

Pattern:

    import string ; string.method(s, ...)  -->  s.method(...)
    c in string.whitespace                 -->  c.isspace()

Locating: grep string *.py | grep import

startswith and endswith String Methods

Use these string methods instead of slicing. No slice has to be created
and there's no risk of miscounting.

Pattern:

    "foobar"[:3] == "foo"   -->  "foobar".startswith("foo")
    "foobar"[-3:] == "bar"  -->  "foobar".endswith("bar")

The atexit Module

The atexit module supports multiple functions to be executed upon
program termination. Also, it supports parameterized functions.
Unfortunately, its implementation conflicts with the sys.exitfunc
attribute which only supports a single exit function. Code relying on
sys.exitfunc may interfere with other modules (including library
modules) that elect to use the newer and more versatile atexit module.

Pattern:

    sys.exitfunc = myfunc  -->  atexit.register(myfunc)

Python 1.5 or Later

Class-Based Exceptions

String exceptions are deprecated, so derive from the Exception base
class. Unlike the obsolete string exceptions, class exceptions all
derive from another exception or the Exception base class. This allows
meaningful groupings of exceptions. It also allows an "except Exception"
clause to catch all exceptions.

Pattern:

    NewError = 'NewError'  -->  class NewError(Exception): pass

Locating: Use PyChecker.

All Python Versions

Testing for None

Since there is only one None object, equality can be tested with
identity. Identity tests are slightly faster than equality tests. Also,
some object types may overload comparison, so equality testing may be
much slower.

Pattern:

    if v == None  -->  if v is None:
    if v != None  -->  if v is not None:

Locating: grep '== None' or grep '!= None'

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 End: