PEP: 3103 Title: A Switch/Case Statement Author: Guido van Rossum
<guido@python.org> Status: Rejected Type: Standards Track Content-Type:
text/x-rst Created: 25-Jun-2006 Python-Version: 3.0 Post-History:
26-Jun-2006

Rejection Notice

A quick poll during my keynote presentation at PyCon 2007 shows this
proposal has no popular support. I therefore reject it.

Abstract

Python-dev has recently seen a flurry of discussion on adding a switch
statement. In this PEP I'm trying to extract my own preferences from the
smorgasbord of proposals, discussing alternatives and explaining my
choices where I can. I'll also indicate how strongly I feel about
alternatives I discuss.

This PEP should be seen as an alternative to PEP 275. My views are
somewhat different from that PEP's author, but I'm grateful for the work
done in that PEP.

This PEP introduces canonical names for the many variants that have been
discussed for different aspects of the syntax and semantics, such as
"alternative 1", "school II", "option 3" and so on. Hopefully these
names will help the discussion.

Rationale

A common programming idiom is to consider an expression and do different
things depending on its value. This is usually done with a chain of
if/elif tests; I'll refer to this form as the "if/elif chain". There are
two main motivations to want to introduce new syntax for this idiom:

-   It is repetitive: the variable and the test operator, usually '=='
    or 'in', are repeated in each if/elif branch.
-   It is inefficient: when an expression matches the last test value
    (or no test value at all) it is compared to each of the preceding
    test values.

Both of these complaints are relatively mild; there isn't a lot of
readability or performance to be gained by writing this differently.
Yet, some kind of switch statement is found in many languages and it is
not unreasonable to expect that its addition to Python will allow us to
write up certain code more cleanly and efficiently than before.

There are forms of dispatch that are not suitable for the proposed
switch statement; for example, when the number of cases is not
statically known, or when it is desirable to place the code for
different cases in different classes or files.

Basic Syntax

I'm considering several variants of the syntax first proposed in PEP 275
here. There are lots of other possibilities, but I don't see that they
add anything.

I've recently been converted to alternative 1.

I should note that all alternatives here have the "implicit break"
property: at the end of the suite for a particular case, the control
flow jumps to the end of the whole switch statement. There is no way to
pass control from one case to another. This in contrast to C, where an
explicit 'break' statement is required to prevent falling through to the
next case.

In all alternatives, the else-suite is optional. It is more Pythonic to
use 'else' here rather than introducing a new reserved word, 'default',
as in C.

Semantics are discussed in the next top-level section.

Alternative 1

This is the preferred form in PEP 275:

    switch EXPR:
        case EXPR:
            SUITE
        case EXPR:
            SUITE
        ...
        else:
            SUITE

The main downside is that the suites where all the action is are
indented two levels deep; this can be remedied by indenting the cases
"half a level" (e.g. 2 spaces if the general indentation level is 4).

Alternative 2

This is Fredrik Lundh's preferred form; it differs by not indenting the
cases:

    switch EXPR:
    case EXPR:
        SUITE
    case EXPR:
        SUITE
    ....
    else:
        SUITE

Some reasons not to choose this include expected difficulties for
auto-indenting editors, folding editors, and the like; and confused
users. There are no situations currently in Python where a line ending
in a colon is followed by an unindented line.

Alternative 3

This is the same as alternative 2 but leaves out the colon after the
switch:

    switch EXPR
    case EXPR:
        SUITE
    case EXPR:
        SUITE
    ....
    else:
        SUITE

The hope of this alternative is that it will not upset the auto-indent
logic of the average Python-aware text editor less. But it looks strange
to me.

Alternative 4

This leaves out the 'case' keyword on the basis that it is redundant:

    switch EXPR:
        EXPR:
            SUITE
        EXPR:
            SUITE
        ...
        else:
            SUITE

Unfortunately now we are forced to indent the case expressions, because
otherwise (at least in the absence of an 'else' keyword) the parser
would have a hard time distinguishing between an unindented case
expression (which continues the switch statement) or an unrelated
statement that starts like an expression (such as an assignment or a
procedure call). The parser is not smart enough to backtrack once it
sees the colon. This is my least favorite alternative.

Extended Syntax

There is one additional concern that needs to be addressed
syntactically. Often two or more values need to be treated the same. In
C, this done by writing multiple case labels together without any code
between them. The "fall through" semantics then mean that these are all
handled by the same code. Since the Python switch will not have
fall-through semantics (which have yet to find a champion) we need
another solution. Here are some alternatives.

Alternative A

Use:

    case EXPR:

to match on a single expression; use:

    case EXPR, EXPR, ...:

to match on multiple expressions. The is interpreted so that if EXPR is
a parenthesized tuple or another expression whose value is a tuple, the
switch expression must equal that tuple, not one of its elements. This
means that we cannot use a variable to indicate multiple cases. While
this is also true in C's switch statement, it is a relatively common
occurrence in Python (see for example sre_compile.py).

Alternative B

Use:

    case EXPR:

to match on a single expression; use:

    case in EXPR_LIST:

to match on multiple expressions. If EXPR_LIST is a single expression,
the 'in' forces its interpretation as an iterable (or something
supporting __contains__, in a minority semantics alternative). If it is
multiple expressions, each of those is considered for a match.

Alternative C

Use:

    case EXPR:

to match on a single expression; use:

    case EXPR, EXPR, ...:

to match on multiple expressions (as in alternative A); and use:

    case *EXPR:

to match on the elements of an expression whose value is an iterable.
The latter two cases can be combined, so that the true syntax is more
like this:

    case [*]EXPR, [*]EXPR, ...:

The * notation is similar to the use of prefix * already in use for
variable-length parameter lists and for passing computed argument lists,
and often proposed for value-unpacking (e.g. a, b, *c = X as an
alternative to (a, b), c = X[:2], X[2:]).

Alternative D

This is a mixture of alternatives B and C; the syntax is like
alternative B but instead of the 'in' keyword it uses '*'. This is more
limited, but still allows the same flexibility. It uses:

    case EXPR:

to match on a single expression and:

    case *EXPR:

to match on the elements of an iterable. If one wants to specify
multiple matches in one case, one can write this:

    case *(EXPR, EXPR, ...):

or perhaps this (although it's a bit strange because the relative
priority of '*' and ',' is different than elsewhere):

    case * EXPR, EXPR, ...:

Discussion

Alternatives B, C and D are motivated by the desire to specify multiple
cases with the same treatment using a variable representing a set
(usually a tuple) rather than spelling them out. The motivation for this
is usually that if one has several switches over the same set of cases
it's a shame to have to spell out all the alternatives each time. An
additional motivation is to be able to specify ranges to be matched
easily and efficiently, similar to Pascal's "1..1000:" notation. At the
same time we want to prevent the kind of mistake that is common in
exception handling (and which will be addressed in Python 3000 by
changing the syntax of the except clause): writing "case 1, 2:" where
"case (1, 2):" was meant, or vice versa.

The case could be made that the need is insufficient for the added
complexity; C doesn't have a way to express ranges either, and it's used
a lot more than Pascal these days. Also, if a dispatch method based on
dict lookup is chosen as the semantics, large ranges could be
inefficient (consider range(1, sys.maxint)).

All in all my preferences are (from most to least favorite) B, A, D', C,
where D' is D without the third possibility.

Semantics

There are several issues to review before we can choose the right
semantics.

If/Elif Chain vs. Dict-based Dispatch

There are several main schools of thought about the switch statement's
semantics:

-   School I wants to define the switch statement in term of an
    equivalent if/elif chain (possibly with some optimization thrown
    in).
-   School II prefers to think of it as a dispatch on a precomputed
    dict. There are different choices for when the precomputation
    happens.
-   There's also school III, which agrees with school I that the
    definition of a switch statement should be in terms of an equivalent
    if/elif chain, but concedes to the optimization camp that all
    expressions involved must be hashable.

We need to further separate school I into school Ia and school Ib:

-   School Ia has a simple position: a switch statement is translated to
    an equivalent if/elif chain, and that's that. It should not be
    linked to optimization at all. That is also my main objection
    against this school: without any hint of optimization, the switch
    statement isn't attractive enough to warrant new syntax.
-   School Ib has a more complex position: it agrees with school II that
    optimization is important, and is willing to concede the compiler
    certain liberties to allow this. (For example, PEP 275 Solution 1.)
    In particular, hash() of the switch and case expressions may or may
    not be called (so it should be side-effect-free); and the case
    expressions may not be evaluated each time as expected by the
    if/elif chain behavior, so the case expressions should also be
    side-effect free. My objection to this (elaborated below) is that if
    either the hash() or the case expressions aren't side-effect-free,
    optimized and unoptimized code may behave differently.

School II grew out of the realization that optimization of commonly
found cases isn't so easy, and that it's better to face this head on.
This will become clear below.

The differences between school I (mostly school Ib) and school II are
threefold:

-   When optimizing using a dispatch dict, if either the switch
    expression or the case expressions are unhashable (in which case
    hash() raises an exception), school Ib requires catching the hash()
    failure and falling back to an if/elif chain. School II simply lets
    the exception happen. The problem with catching an exception in
    hash() as required by school Ib, is that this may hide a genuine
    bug. A possible way out is to only use a dispatch dict if all case
    expressions are ints, strings or other built-ins with known good
    hash behavior, and to only attempt to hash the switch expression if
    it is also one of those types. Type objects should probably also be
    supported here. This is the (only) problem that school III
    addresses.
-   When optimizing using a dispatch dict, if the hash() function of any
    expression involved returns an incorrect value, under school Ib,
    optimized code will not behave the same as unoptimized code. This is
    a well-known problem with optimization-related bugs, and waste lots
    of developer time. Under school II, in this situation incorrect
    results are produced at least consistently, which should make
    debugging a bit easier. The way out proposed for the previous bullet
    would also help here.
-   School Ib doesn't have a good optimization strategy if the case
    expressions are named constants. The compiler cannot know their
    values for sure, and it cannot know whether they are truly constant.
    As a way out, it has been proposed to re-evaluate the expression
    corresponding to the case once the dict has identified which case
    should be taken, to verify that the value of the expression didn't
    change. But strictly speaking, all the case expressions occurring
    before that case would also have to be checked, in order to preserve
    the true if/elif chain semantics, thereby completely killing the
    optimization. Another proposed solution is to have callbacks
    notifying the dispatch dict of changes in the value of variables or
    attributes involved in the case expressions. But this is not likely
    implementable in the general case, and would require many namespaces
    to bear the burden of supporting such callbacks, which currently
    don't exist at all.
-   Finally, there's a difference of opinion regarding the treatment of
    duplicate cases (i.e. two or more cases with match expressions that
    evaluates to the same value). School I wants to treat this the same
    is an if/elif chain would treat it (i.e. the first match wins and
    the code for the second match is silently unreachable); school II
    wants this to be an error at the time the dispatch dict is frozen
    (so dead code doesn't go undiagnosed).

School I sees trouble in school II's approach of pre-freezing a dispatch
dict because it places a new and unusual burden on programmers to
understand exactly what kinds of case values are allowed to be frozen
and when the case values will be frozen, or they might be surprised by
the switch statement's behavior.

School II doesn't believe that school Ia's unoptimized switch is worth
the effort, and it sees trouble in school Ib's proposal for
optimization, which can cause optimized and unoptimized code to behave
differently.

In addition, school II sees little value in allowing cases involving
unhashable values; after all if the user expects such values, they can
just as easily write an if/elif chain. School II also doesn't believe
that it's right to allow dead code due to overlapping cases to occur
unflagged, when the dict-based dispatch implementation makes it so easy
to trap this.

However, there are some use cases for overlapping/duplicate cases.
Suppose you're switching on some OS-specific constants (e.g. exported by
the os module or some module like that). You have a case for each. But
on some OS, two different constants have the same value (since on that
OS they are implemented the same way -- like O_TEXT and O_BINARY on
Unix). If duplicate cases are flagged as errors, your switch wouldn't
work at all on that OS. It would be much better if you could arrange the
cases so that one case has preference over another.

There's also the (more likely) use case where you have a set of cases to
be treated the same, but one member of the set must be treated
differently. It would be convenient to put the exception in an earlier
case and be done with it.

(Yes, it seems a shame not to be able to diagnose dead code due to
accidental case duplication. Maybe that's less important, and pychecker
can deal with it? After all we don't diagnose duplicate method
definitions either.)

This suggests school IIb: like school II but redundant cases must be
resolved by choosing the first match. This is trivial to implement when
building the dispatch dict (skip keys already present).

(An alternative would be to introduce new syntax to indicate "okay to
have overlapping cases" or "ok if this case is dead code" but I find
that overkill.)

Personally, I'm in school II: I believe that the dict-based dispatch is
the one true implementation for switch statements and that we should
face the limitations up front, so that we can reap maximal benefits. I'm
leaning towards school IIb -- duplicate cases should be resolved by the
ordering of the cases instead of flagged as errors.

When to Freeze the Dispatch Dict

For the supporters of school II (dict-based dispatch), the next big
dividing issue is when to create the dict used for switching. I call
this "freezing the dict".

The main problem that makes this interesting is the observation that
Python doesn't have named compile-time constants. What is conceptually a
constant, such as re.IGNORECASE, is a variable to the compiler, and
there's nothing to stop crooked code from modifying its value.

Option 1

The most limiting option is to freeze the dict in the compiler. This
would require that the case expressions are all literals or compile-time
expressions involving only literals and operators whose semantics are
known to the compiler, since with the current state of Python's dynamic
semantics and single-module compilation, there is no hope for the
compiler to know with sufficient certainty the values of any variables
occurring in such expressions. This is widely though not universally
considered too restrictive.

Raymond Hettinger is the main advocate of this approach. He proposes a
syntax where only a single literal of certain types is allowed as the
case expression. It has the advantage of being unambiguous and easy to
implement.

My main complaint about this is that by disallowing "named constants" we
force programmers to give up good habits. Named constants are introduced
in most languages to solve the problem of "magic numbers" occurring in
the source code. For example, sys.maxint is a lot more readable than
2147483647. Raymond proposes to use string literals instead of named
"enums", observing that the string literal's content can be the name
that the constant would otherwise have. Thus, we could write "case
'IGNORECASE':" instead of "case re.IGNORECASE:". However, if there is a
spelling error in the string literal, the case will silently be ignored,
and who knows when the bug is detected. If there is a spelling error in
a NAME, however, the error will be caught as soon as it is evaluated.
Also, sometimes the constants are externally defined (e.g. when parsing
a file format like JPEG) and we can't easily choose appropriate string
values. Using an explicit mapping dict sounds like a poor hack.

Option 2

The oldest proposal to deal with this is to freeze the dispatch dict the
first time the switch is executed. At this point we can assume that all
the named "constants" (constant in the programmer's mind, though not to
the compiler) used as case expressions are defined --otherwise an
if/elif chain would have little chance of success either. Assuming the
switch will be executed many times, doing some extra work the first time
pays back quickly by very quick dispatch times later.

An objection to this option is that there is no obvious object where the
dispatch dict can be stored. It can't be stored on the code object,
which is supposed to be immutable; it can't be stored on the function
object, since many function objects may be created for the same function
(e.g. for nested functions). In practice, I'm sure that something can be
found; it could be stored in a section of the code object that's not
considered when comparing two code objects or when pickling or
marshalling a code object; or all switches could be stored in a dict
indexed by weak references to code objects. The solution should also be
careful not to leak switch dicts between multiple interpreters.

Another objection is that the first-use rule allows obfuscated code like
this:

    def foo(x, y):
        switch x:
        case y:
            print 42

To the untrained eye (not familiar with Python) this code would be
equivalent to this:

    def foo(x, y):
        if x == y:
            print 42

but that's not what it does (unless it is always called with the same
value as the second argument). This has been addressed by suggesting
that the case expressions should not be allowed to reference local
variables, but this is somewhat arbitrary.

A final objection is that in a multi-threaded application, the first-use
rule requires intricate locking in order to guarantee the correct
semantics. (The first-use rule suggests a promise that side effects of
case expressions are incurred exactly once.) This may be as tricky as
the import lock has proved to be, since the lock has to be held while
all the case expressions are being evaluated.

Option 3

A proposal that has been winning support (including mine) is to freeze a
switch's dict when the innermost function containing it is defined. The
switch dict is stored on the function object, just as parameter defaults
are, and in fact the case expressions are evaluated at the same time and
in the same scope as the parameter defaults (i.e. in the scope
containing the function definition).

This option has the advantage of avoiding many of the finesses needed to
make option 2 work: there's no need for locking, no worry about
immutable code objects or multiple interpreters. It also provides a
clear explanation for why locals can't be referenced in case
expressions.

This option works just as well for situations where one would typically
use a switch; case expressions involving imported or global named
constants work exactly the same way as in option 2, as long as they are
imported or defined before the function definition is encountered.

A downside however is that the dispatch dict for a switch inside a
nested function must be recomputed each time the nested function is
defined. For certain "functional" styles of programming this may make
switch unattractive in nested functions. (Unless all case expressions
are compile-time constants; then the compiler is of course free to
optimize away the switch freezing code and make the dispatch table part
of the code object.)

Another downside is that under this option, there's no clear moment when
the dispatch dict is frozen for a switch that doesn't occur inside a
function. There are a few pragmatic choices for how to treat a switch
outside a function:

(a) Disallow it.
(b) Translate it into an if/elif chain.
(c) Allow only compile-time constant expressions.
(d) Compute the dispatch dict each time the switch is reached.
(e) Like (b) but tests that all expressions evaluated are hashable.

Of these, (a) seems too restrictive: it's uniformly worse than (c); and
(d) has poor performance for little or no benefits compared to (b). It
doesn't make sense to have a performance-critical inner loop at the
module level, as all local variable references are slow there; hence (b)
is my (weak) favorite. Perhaps I should favor (e), which attempts to
prevent atypical use of a switch; examples that work interactively but
not in a function are annoying. In the end I don't think this issue is
all that important (except it must be resolved somehow) and am willing
to leave it up to whoever ends up implementing it.

When a switch occurs in a class but not in a function, we can freeze the
dispatch dict at the same time the temporary function object
representing the class body is created. This means the case expressions
can reference module globals but not class variables. Alternatively, if
we choose (b) above, we could choose this implementation inside a class
definition as well.

Option 4

There are a number of proposals to add a construct to the language that
makes the concept of a value pre-computed at function definition time
generally available, without tying it either to parameter default values
or case expressions. Some keywords proposed include 'const', 'static',
'only' or 'cached'. The associated syntax and semantics vary.

These proposals are out of scope for this PEP, except to suggest that if
such a proposal is accepted, there are two ways for the switch to
benefit: we could require case expressions to be either compile-time
constants or pre-computed values; or we could make pre-computed values
the default (and only) evaluation mode for case expressions. The latter
would be my preference, since I don't see a use for more dynamic case
expressions that isn't addressed adequately by writing an explicit
if/elif chain.

Conclusion

It is too early to decide. I'd like to see at least one completed
proposal for pre-computed values before deciding. In the meantime,
Python is fine without a switch statement, and perhaps those who claim
it would be a mistake to add one are right.

Copyright

This document has been placed in the public domain.