PEP: 340 Title: Anonymous Block Statements Version: $Revision$
Last-Modified: $Date$ Author: Guido van Rossum Status: Rejected Type:
Standards Track Content-Type: text/x-rst Created: 27-Apr-2005
Post-History:

Introduction

This PEP proposes a new type of compound statement which can be used for
resource management purposes. The new statement type is provisionally
called the block-statement because the keyword to be used has not yet
been chosen.

This PEP competes with several other PEPs: PEP 288 (Generators
Attributes and Exceptions; only the second part), PEP 310 (Reliable
Acquisition/Release Pairs), and PEP 325 (Resource-Release Support for
Generators).

I should clarify that using a generator to "drive" a block statement is
really a separable proposal; with just the definition of the block
statement from the PEP you could implement all the examples using a
class (similar to example 6, which is easily turned into a template).
But the key idea is using a generator to drive a block statement; the
rest is elaboration, so I'd like to keep these two parts together.

(PEP 342, Enhanced Iterators, was originally a part of this PEP; but the
two proposals are really independent and with Steven Bethard's help I
have moved it to a separate PEP.)

Rejection Notice

I am rejecting this PEP in favor of PEP 343. See the motivational
section in that PEP for the reasoning behind this rejection. GvR.

Motivation and Summary

(Thanks to Shane Hathaway -- Hi Shane!)

Good programmers move commonly used code into reusable functions.
Sometimes, however, patterns arise in the structure of the functions
rather than the actual sequence of statements. For example, many
functions acquire a lock, execute some code specific to that function,
and unconditionally release the lock. Repeating the locking code in
every function that uses it is error prone and makes refactoring
difficult.

Block statements provide a mechanism for encapsulating patterns of
structure. Code inside the block statement runs under the control of an
object called a block iterator. Simple block iterators execute code
before and after the code inside the block statement. Block iterators
also have the opportunity to execute the controlled code more than once
(or not at all), catch exceptions, or receive data from the body of the
block statement.

A convenient way to write block iterators is to write a generator (PEP
255). A generator looks a lot like a Python function, but instead of
returning a value immediately, generators pause their execution at
"yield" statements. When a generator is used as a block iterator, the
yield statement tells the Python interpreter to suspend the block
iterator, execute the block statement body, and resume the block
iterator when the body has executed.

The Python interpreter behaves as follows when it encounters a block
statement based on a generator. First, the interpreter instantiates the
generator and begins executing it. The generator does setup work
appropriate to the pattern it encapsulates, such as acquiring a lock,
opening a file, starting a database transaction, or starting a loop.
Then the generator yields execution to the body of the block statement
using a yield statement. When the block statement body completes, raises
an uncaught exception, or sends data back to the generator using a
continue statement, the generator resumes. At this point, the generator
can either clean up and stop or yield again, causing the block statement
body to execute again. When the generator finishes, the interpreter
leaves the block statement.

Use Cases

See the Examples section near the end.

Specification: the __exit__() Method

An optional new method for iterators is proposed, called __exit__(). It
takes up to three arguments which correspond to the three "arguments" to
the raise-statement: type, value, and traceback. If all three arguments
are None, sys.exc_info() may be consulted to provide suitable default
values.

Specification: the Anonymous Block Statement

A new statement is proposed with the syntax:

    block EXPR1 as VAR1:
        BLOCK1

Here, 'block' and 'as' are new keywords; EXPR1 is an arbitrary
expression (but not an expression-list) and VAR1 is an arbitrary
assignment target (which may be a comma-separated list).

The "as VAR1" part is optional; if omitted, the assignments to VAR1 in
the translation below are omitted (but the expressions assigned are
still evaluated!).

The choice of the 'block' keyword is contentious; many alternatives have
been proposed, including not to use a keyword at all (which I actually
like). PEP 310 uses 'with' for similar semantics, but I would like to
reserve that for a with-statement similar to the one found in Pascal and
VB. (Though I just found that the C# designers don't like 'with'[1], and
I have to agree with their reasoning.) To sidestep this issue
momentarily I'm using 'block' until we can agree on the right keyword,
if any.

Note that the 'as' keyword is not contentious (it will finally be
elevated to proper keyword status).

Note that it is up to the iterator to decide whether a block-statement
represents a loop with multiple iterations; in the most common use case
BLOCK1 is executed exactly once. To the parser, however, it is always a
loop; break and continue return transfer to the block's iterator (see
below for details).

The translation is subtly different from a for-loop: iter() is not
called, so EXPR1 should already be an iterator (not just an iterable);
and the iterator is guaranteed to be notified when the block-statement
is left, regardless if this is due to a break, return or exception:

    itr = EXPR1  # The iterator
    ret = False  # True if a return statement is active
    val = None   # Return value, if ret == True
    exc = None   # sys.exc_info() tuple if an exception is active
    while True:
        try:
            if exc:
                ext = getattr(itr, "__exit__", None)
                if ext is not None:
                    VAR1 = ext(*exc)   # May re-raise *exc
                else:
                    raise exc[0], exc[1], exc[2]
            else:
                VAR1 = itr.next()  # May raise StopIteration
        except StopIteration:
            if ret:
                return val
            break
        try:
            ret = False
            val = exc = None
            BLOCK1
        except:
            exc = sys.exc_info()

(However, the variables 'itr' etc. are not user-visible and the built-in
names used cannot be overridden by the user.)

Inside BLOCK1, the following special translations apply:

-   "break" is always legal; it is translated into:

        exc = (StopIteration, None, None)
        continue

-   "return EXPR3" is only legal when the block-statement is contained
    in a function definition; it is translated into:

        exc = (StopIteration, None, None)
        ret = True
        val = EXPR3
        continue

The net effect is that break and return behave much the same as if the
block-statement were a for-loop, except that the iterator gets a chance
at resource cleanup before the block-statement is left, through the
optional __exit__() method. The iterator also gets a chance if the
block-statement is left through raising an exception. If the iterator
doesn't have an __exit__() method, there is no difference with a
for-loop (except that a for-loop calls iter() on EXPR1).

Note that a yield-statement in a block-statement is not treated
differently. It suspends the function containing the block without
notifying the block's iterator. The block's iterator is entirely unaware
of this yield, since the local control flow doesn't actually leave the
block. In other words, it is not like a break or return statement. When
the loop that was resumed by the yield calls next(), the block is
resumed right after the yield. (See example 7 below.) The generator
finalization semantics described below guarantee (within the limitations
of all finalization semantics) that the block will be resumed
eventually.

Unlike the for-loop, the block-statement does not have an else-clause. I
think it would be confusing, and emphasize the "loopiness" of the
block-statement, while I want to emphasize its difference from a
for-loop. In addition, there are several possible semantics for an
else-clause, and only a very weak use case.

Specification: Generator Exit Handling

Generators will implement the new __exit__() method API.

Generators will be allowed to have a yield statement inside a
try-finally statement.

The expression argument to the yield-statement will become optional
(defaulting to None).

When __exit__() is called, the generator is resumed but at the point of
the yield-statement the exception represented by the __exit__
argument(s) is raised. The generator may re-raise this exception, raise
another exception, or yield another value, except that if the exception
passed in to __exit__() was StopIteration, it ought to raise
StopIteration (otherwise the effect would be that a break is turned into
continue, which is unexpected at least). When the initial call resuming
the generator is an __exit__() call instead of a next() call, the
generator's execution is aborted and the exception is re-raised without
passing control to the generator's body.

When a generator that has not yet terminated is garbage-collected
(either through reference counting or by the cyclical garbage
collector), its __exit__() method is called once with StopIteration as
its first argument. Together with the requirement that a generator ought
to raise StopIteration when __exit__() is called with StopIteration,
this guarantees the eventual activation of any finally-clauses that were
active when the generator was last suspended. Of course, under certain
circumstances the generator may never be garbage-collected. This is no
different than the guarantees that are made about finalizers (__del__()
methods) of other objects.

Alternatives Considered and Rejected

-   Many alternatives have been proposed for 'block'. I haven't seen a
    proposal for another keyword that I like better than 'block' yet.
    Alas, 'block' is also not a good choice; it is a rather popular name
    for variables, arguments and methods. Perhaps 'with' is the best
    choice after all?

-   Instead of trying to pick the ideal keyword, the block-statement
    could simply have the form:

        EXPR1 as VAR1:
            BLOCK1

    This is at first attractive because, together with a good choice of
    function names (like those in the Examples section below) used in
    EXPR1, it reads well, and feels like a "user-defined statement". And
    yet, it makes me (and many others) uncomfortable; without a keyword
    the syntax is very "bland", difficult to look up in a manual
    (remember that 'as' is optional), and it makes the meaning of break
    and continue in the block-statement even more confusing.

-   Phillip Eby has proposed to have the block-statement use an entirely
    different API than the for-loop, to differentiate between the two. A
    generator would have to be wrapped in a decorator to make it support
    the block API. IMO this adds more complexity with very little
    benefit; and we can't really deny that the block-statement is
    conceptually a loop -- it supports break and continue, after all.

-   This keeps getting proposed: "block VAR1 = EXPR1" instead of "block
    EXPR1 as VAR1". That would be very misleading, since VAR1 does not
    get assigned the value of EXPR1; EXPR1 results in a generator which
    is assigned to an internal variable, and VAR1 is the value returned
    by successive calls to the __next__() method of that iterator.

-   Why not change the translation to apply iter(EXPR1)? All the
    examples would continue to work. But this makes the block-statement
    more like a for-loop, while the emphasis ought to be on the
    difference between the two. Not calling iter() catches a bunch of
    misunderstandings, like using a sequence as EXPR1.

Comparison to Thunks

Alternative semantics proposed for the block-statement turn the block
into a thunk (an anonymous function that blends into the containing
scope).

The main advantage of thunks that I can see is that you can save the
thunk for later, like a callback for a button widget (the thunk then
becomes a closure). You can't use a yield-based block for that (except
in Ruby, which uses yield syntax with a thunk-based implementation). But
I have to say that I almost see this as an advantage: I think I'd be
slightly uncomfortable seeing a block and not knowing whether it will be
executed in the normal control flow or later. Defining an explicit
nested function for that purpose doesn't have this problem for me,
because I already know that the 'def' keyword means its body is executed
later.

The other problem with thunks is that once we think of them as the
anonymous functions they are, we're pretty much forced to say that a
return statement in a thunk returns from the thunk rather than from the
containing function. Doing it any other way would cause major weirdness
when the thunk were to survive its containing function as a closure
(perhaps continuations would help, but I'm not about to go there :-).

But then an IMO important use case for the resource cleanup template
pattern is lost. I routinely write code like this:

    def findSomething(self, key, default=None):
        self.lock.acquire()
        try:
             for item in self.elements:
                 if item.matches(key):
                     return item
             return default
        finally:
           self.lock.release()

and I'd be bummed if I couldn't write this as:

    def findSomething(self, key, default=None):
        block locking(self.lock):
             for item in self.elements:
                 if item.matches(key):
                     return item
             return default

This particular example can be rewritten using a break:

    def findSomething(self, key, default=None):
        block locking(self.lock):
             for item in self.elements:
                 if item.matches(key):
                     break
             else:
                 item = default
         return item

but it looks forced and the transformation isn't always that easy; you'd
be forced to rewrite your code in a single-return style which feels too
restrictive.

Also note the semantic conundrum of a yield in a thunk -- the only
reasonable interpretation is that this turns the thunk into a generator!

Greg Ewing believes that thunks "would be a lot simpler, doing just what
is required without any jiggery pokery with exceptions and
break/continue/return statements. It would be easy to explain what it
does and why it's useful."

But in order to obtain the required local variable sharing between the
thunk and the containing function, every local variable used or set in
the thunk would have to become a 'cell' (our mechanism for sharing
variables between nested scopes). Cells slow down access compared to
regular local variables: access involves an extra C function call
(PyCell_Get() or PyCell_Set()).

Perhaps not entirely coincidentally, the last example above
(findSomething() rewritten to avoid a return inside the block) shows
that, unlike for regular nested functions, we'll want variables assigned
to by the thunk also to be shared with the containing function, even if
they are not assigned to outside the thunk.

Greg Ewing again: "generators have turned out to be more powerful,
because you can have more than one of them on the go at once. Is there a
use for that capability here?"

I believe there are definitely uses for this; several people have
already shown how to do asynchronous light-weight threads using
generators (e.g. David Mertz quoted in PEP 288, and Fredrik Lundh[2]).

And finally, Greg says: "a thunk implementation has the potential to
easily handle multiple block arguments, if a suitable syntax could ever
be devised. It's hard to see how that could be done in a general way
with the generator implementation."

However, the use cases for multiple blocks seem elusive.

(Proposals have since been made to change the implementation of thunks
to remove most of these objections, but the resulting semantics are
fairly complex to explain and to implement, so IMO that defeats the
purpose of using thunks in the first place.)

Examples

(Several of these examples contain "yield None". If PEP 342 is accepted,
these can be changed to just "yield" of course.)

1.  A template for ensuring that a lock, acquired at the start of a
    block, is released when the block is left:

        def locking(lock):
            lock.acquire()
            try:
                yield None
            finally:
                lock.release()

    Used as follows:

        block locking(myLock):
            # Code here executes with myLock held.  The lock is
            # guaranteed to be released when the block is left (even
            # if via return or by an uncaught exception).

2.  A template for opening a file that ensures the file is closed when
    the block is left:

        def opening(filename, mode="r"):
            f = open(filename, mode)
            try:
                yield f
            finally:
                f.close()

    Used as follows:

        block opening("/etc/passwd") as f:
            for line in f:
                print line.rstrip()

3.  A template for committing or rolling back a database transaction:

        def transactional(db):
            try:
                yield None
            except:
                db.rollback()
                raise
            else:
                db.commit()

4.  A template that tries something up to n times:

        def auto_retry(n=3, exc=Exception):
            for i in range(n):
                try:
                    yield None
                    return
                except exc, err:
                    # perhaps log exception here
                    continue
            raise # re-raise the exception we caught earlier

    Used as follows:

        block auto_retry(3, IOError):
            f = urllib.urlopen("https://www.example.com/")
            print f.read()

5.  It is possible to nest blocks and combine templates:

        def locking_opening(lock, filename, mode="r"):
            block locking(lock):
                block opening(filename) as f:
                    yield f

    Used as follows:

        block locking_opening(myLock, "/etc/passwd") as f:
            for line in f:
                print line.rstrip()

    (If this example confuses you, consider that it is equivalent to
    using a for-loop with a yield in its body in a regular generator
    which is invoking another iterator or generator recursively; see for
    example the source code for os.walk().)

6.  It is possible to write a regular iterator with the semantics of
    example 1:

        class locking:
           def __init__(self, lock):
               self.lock = lock
               self.state = 0
           def __next__(self, arg=None):
               # ignores arg
               if self.state:
                   assert self.state == 1
                   self.lock.release()
                   self.state += 1
                   raise StopIteration
               else:
                   self.lock.acquire()
                   self.state += 1
                   return None
           def __exit__(self, type, value=None, traceback=None):
               assert self.state in (0, 1, 2)
               if self.state == 1:
                   self.lock.release()
               raise type, value, traceback

    (This example is easily modified to implement the other examples; it
    shows how much simpler generators are for the same purpose.)

7.  Redirect stdout temporarily:

        def redirecting_stdout(new_stdout):
            save_stdout = sys.stdout
            try:
                sys.stdout = new_stdout
                yield None
            finally:
                sys.stdout = save_stdout

    Used as follows:

        block opening(filename, "w") as f:
            block redirecting_stdout(f):
                print "Hello world"

8.  A variant on opening() that also returns an error condition:

        def opening_w_error(filename, mode="r"):
            try:
                f = open(filename, mode)
            except IOError, err:
                yield None, err
            else:
                try:
                    yield f, None
                finally:
                    f.close()

    Used as follows:

        block opening_w_error("/etc/passwd", "a") as f, err:
            if err:
                print "IOError:", err
            else:
                f.write("guido::0:0::/:/bin/sh\n")

Acknowledgements

In no useful order: Alex Martelli, Barry Warsaw, Bob Ippolito, Brett
Cannon, Brian Sabbey, Chris Ryland, Doug Landauer, Duncan Booth, Fredrik
Lundh, Greg Ewing, Holger Krekel, Jason Diamond, Jim Jewett, Josiah
Carlson, Ka-Ping Yee, Michael Chermside, Michael Hudson, Neil
Schemenauer, Alyssa Coghlan, Paul Moore, Phillip Eby, Raymond Hettinger,
Georg Brandl, Samuele Pedroni, Shannon Behrens, Skip Montanaro, Steven
Bethard, Terry Reedy, Tim Delaney, Aahz, and others. Thanks all for the
valuable contributions!

References

[1] https://mail.python.org/pipermail/python-dev/2005-April/052821.html

Copyright

This document has been placed in the public domain.

[1] https://web.archive.org/web/20060719195933/http://msdn.microsoft.com/vcsharp/programming/language/ask/withstatement/

[2] https://web.archive.org/web/20050204062901/http://effbot.org/zone/asyncore-generators.htm