PEP: 201 Title: Lockstep Iteration Author: Barry Warsaw
<barry@python.org> Status: Final Type: Standards Track Content-Type:
text/x-rst Created: 13-Jul-2000 Python-Version: 2.0 Post-History:
27-Jul-2000

Introduction

This PEP describes the 'lockstep iteration' proposal. This PEP tracks
the status and ownership of this feature, slated for introduction in
Python 2.0. It contains a description of the feature and outlines
changes necessary to support the feature. This PEP summarizes
discussions held in mailing list forums, and provides URLs for further
information, where appropriate. The CVS revision history of this file
contains the definitive historical record.

Motivation

Standard for-loops in Python iterate over every element in a sequence
until the sequence is exhausted[1]. However, for-loops iterate over only
a single sequence, and it is often desirable to loop over more than one
sequence in a lock-step fashion. In other words, in a way such that the
i-th iteration through the loop returns an object containing the i-th
element from each sequence.

The common idioms used to accomplish this are unintuitive. This PEP
proposes a standard way of performing such iterations by introducing a
new builtin function called zip.

While the primary motivation for zip() comes from lock-step iteration,
by implementing zip() as a built-in function, it has additional utility
in contexts other than for-loops.

Lockstep For-Loops

Lockstep for-loops are non-nested iterations over two or more sequences,
such that at each pass through the loop, one element from each sequence
is taken to compose the target. This behavior can already be
accomplished in Python through the use of the map() built-in function:

    >>> a = (1, 2, 3)
    >>> b = (4, 5, 6)
    >>> for i in map(None, a, b): print i
    ...
    (1, 4)
    (2, 5)
    (3, 6)
    >>> map(None, a, b)
    [(1, 4), (2, 5), (3, 6)]

The for-loop simply iterates over this list as normal.

While the map() idiom is a common one in Python, it has several
disadvantages:

-   It is non-obvious to programmers without a functional programming
    background.

-   The use of the magic None first argument is non-obvious.

-   It has arbitrary, often unintended, and inflexible semantics when
    the lists are not of the same length: the shorter sequences are
    padded with None:

        >>> c = (4, 5, 6, 7)
        >>> map(None, a, c)
        [(1, 4), (2, 5), (3, 6), (None, 7)]

For these reasons, several proposals were floated in the Python 2.0 beta
time frame for syntactic support of lockstep for-loops. Here are two
suggestions:

    for x in seq1, y in seq2:
      # stuff

    for x, y in seq1, seq2:
      # stuff

Neither of these forms would work, since they both already mean
something in Python and changing the meanings would break existing code.
All other suggestions for new syntax suffered the same problem, or were
in conflict with other another proposed feature called 'list
comprehensions' (see PEP 202).

The Proposed Solution

The proposed solution is to introduce a new built-in sequence generator
function, available in the __builtin__ module. This function is to be
called zip and has the following signature:

    zip(seqa, [seqb, [...]])

zip() takes one or more sequences and weaves their elements together,
just as map(None, ...) does with sequences of equal length. The weaving
stops when the shortest sequence is exhausted.

Return Value

zip() returns a real Python list, the same way map() does.

Examples

Here are some examples, based on the reference implementation below:

    >>> a = (1, 2, 3, 4)
    >>> b = (5, 6, 7, 8)
    >>> c = (9, 10, 11)
    >>> d = (12, 13)

    >>> zip(a, b)
    [(1, 5), (2, 6), (3, 7), (4, 8)]

    >>> zip(a, d)
    [(1, 12), (2, 13)]

    >>> zip(a, b, c, d)
    [(1, 5, 9, 12), (2, 6, 10, 13)]

Note that when the sequences are of the same length, zip() is
reversible:

    >>> a = (1, 2, 3)
    >>> b = (4, 5, 6)
    >>> x = zip(a, b)
    >>> y = zip(*x) # alternatively, apply(zip, x)
    >>> z = zip(*y) # alternatively, apply(zip, y)
    >>> x
    [(1, 4), (2, 5), (3, 6)]
    >>> y
    [(1, 2, 3), (4, 5, 6)]
    >>> z
    [(1, 4), (2, 5), (3, 6)]
    >>> x == z
    1

It is not possible to reverse zip this way when the sequences are not
all the same length.

Reference Implementation

Here is a reference implementation, in Python of the zip() built-in
function. This will be replaced with a C implementation after final
approval:

    def zip(*args):
        if not args:
            raise TypeError('zip() expects one or more sequence arguments')
        ret = []
        i = 0
        try:
            while 1:
                item = []
                for s in args:
                    item.append(s[i])
                ret.append(tuple(item))
                i = i + 1
        except IndexError:
            return ret

BDFL Pronouncements

Note: the BDFL refers to Guido van Rossum, Python's Benevolent Dictator
For Life.

-   The function's name. An earlier version of this PEP included an open
    issue listing 20+ proposed alternative names to zip(). In the face
    of no overwhelmingly better choice, the BDFL strongly prefers zip()
    due to its Haskell[2] heritage. See version 1.7 of this PEP for the
    list of alternatives.
-   zip() shall be a built-in function.
-   Optional padding. An earlier version of this PEP proposed an
    optional pad keyword argument, which would be used when the argument
    sequences were not the same length. This is similar behavior to the
    map(None, ...) semantics except that the user would be able to
    specify pad object. This has been rejected by the BDFL in favor of
    always truncating to the shortest sequence, because of the KISS
    principle. If there's a true need, it is easier to add later. If it
    is not needed, it would still be impossible to delete it in the
    future.
-   Lazy evaluation. An earlier version of this PEP proposed that zip()
    return a built-in object that performed lazy evaluation using
    __getitem__() protocol. This has been strongly rejected by the BDFL
    in favor of returning a real Python list. If lazy evaluation is
    desired in the future, the BDFL suggests an xzip() function be
    added.
-   zip() with no arguments. the BDFL strongly prefers this raise a
    TypeError exception.
-   zip() with one argument. the BDFL strongly prefers that this return
    a list of 1-tuples.
-   Inner and outer container control. An earlier version of this PEP
    contains a rather lengthy discussion on a feature that some people
    wanted, namely the ability to control what the inner and outer
    container types were (they are tuples and list respectively in this
    version of the PEP). Given the simplified API and implementation,
    this elaboration is rejected. For a more detailed analysis, see
    version 1.7 of this PEP.

Subsequent Change to zip()

In Python 2.4, zip() with no arguments was modified to return an empty
list rather than raising a TypeError exception. The rationale for the
original behavior was that the absence of arguments was thought to
indicate a programming error. However, that thinking did not anticipate
the use of zip() with the * operator for unpacking variable length
argument lists. For example, the inverse of zip could be defined as:
unzip = lambda s: zip(*s). That transformation also defines a matrix
transpose or an equivalent row/column swap for tables defined as lists
of tuples. The latter transformation is commonly used when reading data
files with records as rows and fields as columns. For example, the code:

    date, rain, high, low = zip(*csv.reader(file("weather.csv")))

rearranges columnar data so that each field is collected into individual
tuples for straightforward looping and summarization:

    print "Total rainfall", sum(rain)

Using zip(*args) is more easily coded if zip(*[]) is handled as an
allowable case rather than an exception. This is especially helpful when
data is either built up from or recursed down to a null case with no
records.

Seeing this possibility, the BDFL agreed (with some misgivings) to have
the behavior changed for Py2.4.

Other Changes

-   The xzip() function discussed above was implemented in Py2.3 in the
    itertools module as itertools.izip(). This function provides lazy
    behavior, consuming single elements and producing a single tuple on
    each pass. The "just-in-time" style saves memory and runs faster
    than its list based counterpart, zip().

-   The itertools module also added itertools.repeat() and
    itertools.chain(). These tools can be used together to pad sequences
    with None (to match the behavior of map(None, seqn)):

        zip(firstseq, chain(secondseq, repeat(None)))

References

Greg Wilson's questionnaire on proposed syntax to some CS grad students
http://www.python.org/pipermail/python-dev/2000-July/013139.html

Copyright

This document has been placed in the public domain.

[1] http://docs.python.org/reference/compound_stmts.html#for

[2] http://www.haskell.org/onlinereport/standard-prelude.html#$vzip