PEP: 479 Title: Change StopIteration handling inside generators Version:
$Revision$ Last-Modified: $Date$ Author: Chris Angelico
<rosuav@gmail.com>, Guido van Rossum <guido@python.org> Status: Final
Type: Standards Track Content-Type: text/x-rst Created: 15-Nov-2014
Python-Version: 3.5 Post-History: 15-Nov-2014, 19-Nov-2014, 05-Dec-2014

Abstract

This PEP proposes a change to generators: when StopIteration is raised
inside a generator, it is replaced with RuntimeError. (More precisely,
this happens when the exception is about to bubble out of the
generator's stack frame.) Because the change is backwards incompatible,
the feature is initially introduced using a __future__ statement.

Acceptance

This PEP was accepted by the BDFL on November 22. Because of the
exceptionally short period from first draft to acceptance, the main
objections brought up after acceptance were carefully considered and
have been reflected in the "Alternate proposals" section below. However,
none of the discussion changed the BDFL's mind and the PEP's acceptance
is now final. (Suggestions for clarifying edits are still welcome --
unlike IETF RFCs, the text of a PEP is not cast in stone after its
acceptance, although the core design/plan/specification should not
change after acceptance.)

Rationale

The interaction of generators and StopIteration is currently somewhat
surprising, and can conceal obscure bugs. An unexpected exception should
not result in subtly altered behaviour, but should cause a noisy and
easily-debugged traceback. Currently, StopIteration raised accidentally
inside a generator function will be interpreted as the end of the
iteration by the loop construct driving the generator.

The main goal of the proposal is to ease debugging in the situation
where an unguarded next() call (perhaps several stack frames deep)
raises StopIteration and causes the iteration controlled by the
generator to terminate silently. (Whereas, when some other exception is
raised, a traceback is printed pinpointing the cause of the problem.)

This is particularly pernicious in combination with the yield from
construct of PEP 380, as it breaks the abstraction that a subgenerator
may be factored out of a generator. That PEP notes this limitation, but
notes that "use cases for these [are] rare to non-existent".
Unfortunately while intentional use is rare, it is easy to stumble on
these cases by accident:

    import contextlib

    @contextlib.contextmanager
    def transaction():
        print('begin')
        try:
            yield from do_it()
        except:
            print('rollback')
            raise
        else:
            print('commit')

    def do_it():
        print('Refactored initial setup')
        yield # Body of with-statement is executed here
        print('Refactored finalization of successful transaction')

    def gene():
        for i in range(2):
            with transaction():
                yield i
                # return
                raise StopIteration  # This is wrong
            print('Should not be reached')

    for i in gene():
        print('main: i =', i)

Here factoring out do_it into a subgenerator has introduced a subtle
bug: if the wrapped block raises StopIteration, under the current
behavior this exception will be swallowed by the context manager; and,
worse, the finalization is silently skipped! Similarly problematic
behavior occurs when an asyncio coroutine raises StopIteration, causing
it to terminate silently, or when next is used to take the first result
from an iterator that unexpectedly turns out to be empty, for example:

    # using the same context manager as above
    import pathlib

    with transaction():
        print('commit file {}'.format(
            # I can never remember what the README extension is
            next(pathlib.Path('/some/dir').glob('README*'))))

In both cases, the refactoring abstraction of yield from breaks in the
presence of bugs in client code.

Additionally, the proposal reduces the difference between list
comprehensions and generator expressions, preventing surprises such as
the one that started this discussion[1]. Henceforth, the following
statements will produce the same result if either produces a result at
all:

    a = list(F(x) for x in xs if P(x))
    a = [F(x) for x in xs if P(x)]

With the current state of affairs, it is possible to write a function
F(x) or a predicate P(x) that causes the first form to produce a
(truncated) result, while the second form raises an exception (namely,
StopIteration). With the proposed change, both forms will raise an
exception at this point (albeit RuntimeError in the first case and
StopIteration in the second).

Finally, the proposal also clears up the confusion about how to
terminate a generator: the proper way is return, not
raise StopIteration.

As an added bonus, the above changes bring generator functions much more
in line with regular functions. If you wish to take a piece of code
presented as a generator and turn it into something else, you can
usually do this fairly simply, by replacing every yield with a call to
print() or list.append(); however, if there are any bare next() calls in
the code, you have to be aware of them. If the code was originally
written without relying on StopIteration terminating the function, the
transformation would be that much easier.

Background information

When a generator frame is (re)started as a result of a __next__() (or
send() or throw()) call, one of three outcomes can occur:

-   A yield point is reached, and the yielded value is returned.
-   The frame is returned from; StopIteration is raised.
-   An exception is raised, which bubbles out.

In the latter two cases the frame is abandoned (and the generator
object's gi_frame attribute is set to None).

Proposal

If a StopIteration is about to bubble out of a generator frame, it is
replaced with RuntimeError, which causes the next() call (which invoked
the generator) to fail, passing that exception out. From then on it's
just like any old exception.[2]

This affects the third outcome listed above, without altering any other
effects. Furthermore, it only affects this outcome when the exception
raised is StopIteration (or a subclass thereof).

Note that the proposed replacement happens at the point where the
exception is about to bubble out of the frame, i.e. after any except or
finally blocks that could affect it have been exited. The StopIteration
raised by returning from the frame is not affected (the point being that
StopIteration means that the generator terminated "normally", i.e. it
did not raise an exception).

A subtle issue is what will happen if the caller, having caught the
RuntimeError, calls the generator object's __next__() method again. The
answer is that from this point on it will raise StopIteration -- the
behavior is the same as when any other exception was raised by the
generator.

Another logical consequence of the proposal: if someone uses
g.throw(StopIteration) to throw a StopIteration exception into a
generator, if the generator doesn't catch it (which it could do using a
try/except around the yield), it will be transformed into RuntimeError.

During the transition phase, the new feature must be enabled per-module
using:

    from __future__ import generator_stop

Any generator function constructed under the influence of this directive
will have the REPLACE_STOPITERATION flag set on its code object, and
generators with the flag set will behave according to this proposal.
Once the feature becomes standard, the flag may be dropped; code should
not inspect generators for it.

A proof-of-concept patch has been created to facilitate testing.[3]

Consequences for existing code

This change will affect existing code that depends on StopIteration
bubbling up. The pure Python reference implementation of groupby[4]
currently has comments "Exit on StopIteration" where it is expected that
the exception will propagate and then be handled. This will be unusual,
but not unknown, and such constructs will fail. Other examples abound,
e.g.[5],[6].

(Alyssa Coghlan comments: """If you wanted to factor out a helper
function that terminated the generator you'd have to do "return yield
from helper()" rather than just "helper()".""")

There are also examples of generator expressions floating around that
rely on a StopIteration raised by the expression, the target or the
predicate (rather than by the __next__() call implied in the for loop
proper).

Writing backwards and forwards compatible code

With the exception of hacks that raise StopIteration to exit a generator
expression, it is easy to write code that works equally well under older
Python versions as under the new semantics.

This is done by enclosing those places in the generator body where a
StopIteration is expected (e.g. bare next() calls or in some cases
helper functions that are expected to raise StopIteration) in a
try/except construct that returns when StopIteration is raised. The
try/except construct should appear directly in the generator function;
doing this in a helper function that is not itself a generator does not
work. If raise StopIteration occurs directly in a generator, simply
replace it with return.

Examples of breakage

Generators which explicitly raise StopIteration can generally be changed
to simply return instead. This will be compatible with all existing
Python versions, and will not be affected by __future__. Here are some
illustrations from the standard library.

Lib/ipaddress.py:

    if other == self:
        raise StopIteration

Becomes:

    if other == self:
        return

In some cases, this can be combined with yield from to simplify the
code, such as Lib/difflib.py:

    if context is None:
        while True:
            yield next(line_pair_iterator)

Becomes:

    if context is None:
        yield from line_pair_iterator
        return

(The return is necessary for a strictly-equivalent translation, though
in this particular file, there is no further code, and the return can be
omitted.) For compatibility with pre-3.3 versions of Python, this could
be written with an explicit for loop:

    if context is None:
        for line in line_pair_iterator:
            yield line
        return

More complicated iteration patterns will need explicit try/except
constructs. For example, a hypothetical parser like this:

    def parser(f):
        while True:
            data = next(f)
            while True:
                line = next(f)
                if line == "- end -": break
                data += line
            yield data

would need to be rewritten as:

    def parser(f):
        while True:
            try:
                data = next(f)
                while True:
                    line = next(f)
                    if line == "- end -": break
                    data += line
                yield data
            except StopIteration:
                return

or possibly:

    def parser(f):
        for data in f:
            while True:
                line = next(f)
                if line == "- end -": break
                data += line
            yield data

The latter form obscures the iteration by purporting to iterate over the
file with a for loop, but then also fetches more data from the same
iterator during the loop body. It does, however, clearly differentiate
between a "normal" termination (StopIteration instead of the initial
line) and an "abnormal" termination (failing to find the end marker in
the inner loop, which will now raise RuntimeError).

This effect of StopIteration has been used to cut a generator expression
short, creating a form of takewhile:

    def stop():
        raise StopIteration
    print(list(x for x in range(10) if x < 5 or stop()))
    # prints [0, 1, 2, 3, 4]

Under the current proposal, this form of non-local flow control is not
supported, and would have to be rewritten in statement form:

    def gen():
        for x in range(10):
            if x >= 5: return
            yield x
    print(list(gen()))
    # prints [0, 1, 2, 3, 4]

While this is a small loss of functionality, it is functionality that
often comes at the cost of readability, and just as lambda has
restrictions compared to def, so does a generator expression have
restrictions compared to a generator function. In many cases, the
transformation to full generator function will be trivially easy, and
may improve structural clarity.

Explanation of generators, iterators, and StopIteration

The proposal does not change the relationship between generators and
iterators: a generator object is still an iterator, and not all
iterators are generators. Generators have additional methods that
iterators don't have, like send and throw. All this is unchanged.
Nothing changes for generator users -- only authors of generator
functions may have to learn something new. (This includes authors of
generator expressions that depend on early termination of the iteration
by a StopIteration raised in a condition.)

An iterator is an object with a __next__ method. Like many other special
methods, it may either return a value, or raise a specific exception -
in this case, StopIteration - to signal that it has no value to return.
In this, it is similar to __getattr__ (can raise AttributeError),
__getitem__ (can raise KeyError), and so on. A helper function for an
iterator can be written to follow the same protocol; for example:

    def helper(x, y):
        if x > y: return 1 / (x - y)
        raise StopIteration

    def __next__(self):
        if self.a: return helper(self.b, self.c)
        return helper(self.d, self.e)

Both forms of signalling are carried through: a returned value is
returned, an exception bubbles up. The helper is written to match the
protocol of the calling function.

A generator function is one which contains a yield expression. Each time
it is (re)started, it may either yield a value, or return (including
"falling off the end"). A helper function for a generator can also be
written, but it must also follow generator protocol:

    def helper(x, y):
        if x > y: yield 1 / (x - y)

    def gen(self):
        if self.a: return (yield from helper(self.b, self.c))
        return (yield from helper(self.d, self.e))

In both cases, any unexpected exception will bubble up. Due to the
nature of generators and iterators, an unexpected StopIteration inside a
generator will be converted into RuntimeError, but beyond that, all
exceptions will propagate normally.

Transition plan

-   Python 3.5: Enable new semantics under __future__ import; silent
    deprecation warning if StopIteration bubbles out of a generator not
    under __future__ import.
-   Python 3.6: Non-silent deprecation warning.
-   Python 3.7: Enable new semantics everywhere.

Alternate proposals

Raising something other than RuntimeError

Rather than the generic RuntimeError, it might make sense to raise a new
exception type UnexpectedStopIteration. This has the downside of
implicitly encouraging that it be caught; the correct action is to catch
the original StopIteration, not the chained exception.

Supplying a specific exception to raise on return

Alyssa (Nick) Coghlan suggested a means of providing a specific
StopIteration instance to the generator; if any other instance of
StopIteration is raised, it is an error, but if that particular one is
raised, the generator has properly completed. This subproposal has been
withdrawn in favour of better options, but is retained for reference.

Making return-triggered StopIterations obvious

For certain situations, a simpler and fully backward-compatible solution
may be sufficient: when a generator returns, instead of raising
StopIteration, it raises a specific subclass of StopIteration
(GeneratorReturn) which can then be detected. If it is not that
subclass, it is an escaping exception rather than a return statement.

The inspiration for this alternative proposal was Alyssa's observation
[7] that if an asyncio coroutine[8] accidentally raises StopIteration,
it currently terminates silently, which may present a hard-to-debug
mystery to the developer. The main proposal turns such accidents into
clearly distinguishable RuntimeError exceptions, but if that is
rejected, this alternate proposal would enable asyncio to distinguish
between a return statement and an accidentally-raised StopIteration
exception.

Of the three outcomes listed above, two change:

-   If a yield point is reached, the value, obviously, would still be
    returned.
-   If the frame is returned from, GeneratorReturn (rather than
    StopIteration) is raised.
-   If an instance of GeneratorReturn would be raised, instead an
    instance of StopIteration would be raised. Any other exception
    bubbles up normally.

In the third case, the StopIteration would have the value of the
original GeneratorReturn, and would reference the original exception in
its __cause__. If uncaught, this would clearly show the chaining of
exceptions.

This alternative does not affect the discrepancy between generator
expressions and list comprehensions, but allows generator-aware code
(such as the contextlib and asyncio modules) to reliably differentiate
between the second and third outcomes listed above.

However, once code exists that depends on this distinction between
GeneratorReturn and StopIteration, a generator that invokes another
generator and relies on the latter's StopIteration to bubble out would
still be potentially wrong, depending on the use made of the distinction
between the two exception types.

Converting the exception inside next()

Mark Shannon suggested[9] that the problem could be solved in next()
rather than at the boundary of generator functions. By having next()
catch StopIteration and raise instead ValueError, all unexpected
StopIteration bubbling would be prevented; however, the
backward-incompatibility concerns are far more serious than for the
current proposal, as every next() call now needs to be rewritten to
guard against ValueError instead of StopIteration - not to mention that
there is no way to write one block of code which reliably works on
multiple versions of Python. (Using a dedicated exception type, perhaps
subclassing ValueError, would help this; however, all code would still
need to be rewritten.)

Note that calling next(it, default) catches StopIteration and
substitutes the given default value; this feature is often useful to
avoid a try/except block.

Sub-proposal: decorator to explicitly request current behaviour

Alyssa Coghlan suggested[10] that the situations where the current
behaviour is desired could be supported by means of a decorator:

    from itertools import allow_implicit_stop

    @allow_implicit_stop
    def my_generator():
        ...
        yield next(it)
        ...

Which would be semantically equivalent to:

    def my_generator():
        try:
            ...
            yield next(it)
            ...
        except StopIteration
            return

but be faster, as it could be implemented by simply permitting the
StopIteration to bubble up directly.

Single-source Python 2/3 code would also benefit in a 3.7+ world, since
libraries like six and python-future could just define their own version
of "allow_implicit_stop" that referred to the new builtin in 3.5+, and
was implemented as an identity function in other versions.

However, due to the implementation complexities required, the ongoing
compatibility issues created, the subtlety of the decorator's effect,
and the fact that it would encourage the "quick-fix" solution of just
slapping the decorator onto all generators instead of properly fixing
the code in question, this sub-proposal has been rejected.[11]

Criticism

Unofficial and apocryphal statistics suggest that this is seldom, if
ever, a problem.[12] Code does exist which relies on the current
behaviour (e.g.[13],[14],[15]), and there is the concern that this would
be unnecessary code churn to achieve little or no gain.

Steven D'Aprano started an informal survey on comp.lang.python[16]; at
the time of writing only two responses have been received: one was in
favor of changing list comprehensions to match generator expressions
(!), the other was in favor of this PEP's main proposal.

The existing model has been compared to the perfectly-acceptable issues
inherent to every other case where an exception has special meaning. For
instance, an unexpected KeyError inside a __getitem__ method will be
interpreted as failure, rather than permitted to bubble up. However,
there is a difference. Special methods use return to indicate normality,
and raise to signal abnormality; generators yield to indicate data, and
return to signal the abnormal state. This makes explicitly raising
StopIteration entirely redundant, and potentially surprising. If other
special methods had dedicated keywords to distinguish between their
return paths, they too could turn unexpected exceptions into
RuntimeError; the fact that they cannot should not preclude generators
from doing so.

Why not fix all __next__() methods?

When implementing a regular __next__() method, the only way to indicate
the end of the iteration is to raise StopIteration. So catching
StopIteration here and converting it to RuntimeError would defeat the
purpose. This is a reminder of the special status of generator
functions: in a generator function, raising StopIteration is redundant
since the iteration can be terminated by a simple return.

References

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

[1] Initial mailing list comment
(https://mail.python.org/pipermail/python-ideas/2014-November/029906.html)

[2] Proposal by GvR
(https://mail.python.org/pipermail/python-ideas/2014-November/029953.html)

[3] Tracker issue with Proof-of-Concept patch
(http://bugs.python.org/issue22906)

[4] Pure Python implementation of groupby
(https://docs.python.org/3/library/itertools.html#itertools.groupby)

[5] Split a sequence or generator using a predicate
(http://code.activestate.com/recipes/578416-split-a-sequence-or-generator-using-a-predicate/)

[6] wrap unbounded generator to restrict its output
(http://code.activestate.com/recipes/66427-wrap-unbounded-generator-to-restrict-its-output/)

[7] Post from Alyssa (Nick) Coghlan mentioning asyncio
(https://mail.python.org/pipermail/python-ideas/2014-November/029961.html)

[8] Coroutines in asyncio
(https://docs.python.org/3/library/asyncio-task.html#coroutines)

[9] Post from Mark Shannon with alternate proposal
(https://mail.python.org/pipermail/python-dev/2014-November/137129.html)

[10] Idea from Alyssa Coghlan
(https://mail.python.org/pipermail/python-dev/2014-November/137201.html)

[11] Rejection of above idea by GvR
(https://mail.python.org/pipermail/python-dev/2014-November/137243.html)

[12] Response by Steven D'Aprano
(https://mail.python.org/pipermail/python-ideas/2014-November/029994.html)

[13] Proposal by GvR
(https://mail.python.org/pipermail/python-ideas/2014-November/029953.html)

[14] Split a sequence or generator using a predicate
(http://code.activestate.com/recipes/578416-split-a-sequence-or-generator-using-a-predicate/)

[15] wrap unbounded generator to restrict its output
(http://code.activestate.com/recipes/66427-wrap-unbounded-generator-to-restrict-its-output/)

[16] Thread on comp.lang.python started by Steven D'Aprano
(https://mail.python.org/pipermail/python-list/2014-November/680757.html)