PEP: 635 Title: Structural Pattern Matching: Motivation and Rationale
Version: $Revision$ Last-Modified: $Date$ Author: Tobias Kohn
<kohnt@tobiaskohn.ch>, Guido van Rossum <guido@python.org>
BDFL-Delegate: Discussions-To: python-dev@python.org Status: Final Type:
Informational Content-Type: text/x-rst Created: 12-Sep-2020
Python-Version: 3.10 Post-History: 22-Oct-2020, 08-Feb-2021 Resolution:
https://mail.python.org/archives/list/python-committers@python.org/message/SQC2FTLFV5A7DV7RCEAR2I2IKJKGK7W3

Abstract

This PEP provides the motivation and rationale for PEP 634 ("Structural
Pattern Matching: Specification"). First-time readers are encouraged to
start with PEP 636, which provides a gentler introduction to the
concepts, syntax and semantics of patterns.

Motivation

(Structural) pattern matching syntax is found in many languages, from
Haskell, Erlang and Scala to Elixir and Ruby. (A proposal for JavaScript
is also under consideration.)

Python already supports a limited form of this through sequence
unpacking assignments, which the new proposal leverages.

Several other common Python idioms are also relevant:

-   The if ... elif ... elif ... else idiom is often used to find out
    the type or shape of an object in an ad-hoc fashion, using one or
    more checks like isinstance(x, cls), hasattr(x, "attr"), len(x) == n
    or "key" in x as guards to select an applicable block. The block can
    then assume x supports the interface checked by the guard. For
    example:

        if isinstance(x, tuple) and len(x) == 2:
            host, port = x
            mode = "http"
        elif isinstance(x, tuple) and len(x) == 3:
            host, port, mode = x
        # Etc.

    Code like this is more elegantly rendered using match:

        match x:
            case host, port:
                mode = "http"
            case host, port, mode:
                pass
            # Etc.

-   AST traversal code often looks for nodes matching a given pattern,
    for example the code to detect a node of the shape "A + B * C" might
    look like this:

        if (isinstance(node, BinOp) and node.op == "+"
                and isinstance(node.right, BinOp) and node.right.op == "*"):
            a, b, c = node.left, node.right.left, node.right.right
            # Handle a + b*c

    Using match this becomes more readable:

        match node:
            case BinOp("+", a, BinOp("*", b, c)):
                # Handle a + b*c

We believe that adding pattern matching to Python will enable Python
users to write cleaner, more readable code for examples like those
above, and many others.

For a more academic discussion to this proposal, see[1].

Pattern Matching and OO

Pattern matching is complimentary to the object-oriented paradigm. Using
OO and inheritance we can easily define a method on a base class that
defines default behavior for a specific operation on that class, and we
can override this default behavior in subclasses. We can also use the
Visitor pattern to separate actions from data.

But this is not sufficient for all situations. For example, a code
generator may consume an AST, and have many operations where the
generated code needs to vary based not just on the class of a node, but
also on the value of some class attributes, like the BinOp example
above. The Visitor pattern is insufficiently flexible for this: it can
only select based on the class.

See a complete example.

Like the Visitor pattern, pattern matching allows for a strict
separation of concerns: specific actions or data processing is
independent of the class hierarchy or manipulated objects. When dealing
with predefined or even built-in classes, in particular, it is often
impossible to add further methods to the individual classes. Pattern
matching not only relieves the programmer or class designer from the
burden of the boilerplate code needed for the Visitor pattern, but is
also flexible enough to directly work with built-in types. It naturally
distinguishes between sequences of different lengths, which might all
share the same class despite obviously differing structures. Moreover,
pattern matching automatically takes inheritance into account: a class D
inheriting from C will be handled by a pattern that targets C by
default.

Object oriented programming is geared towards single-dispatch: it is a
single instance (or the type thereof) that determines which method is to
be called. This leads to a somewhat artificial situation in case of
binary operators where both objects might play an equal role in deciding
which implementation to use (Python addresses this through the use of
reversed binary methods). Pattern matching is structurally better suited
to handle such situations of multi-dispatch, where the action to be
taken depends on the types of several objects to equal parts.

Patterns and Functional Style

Many Python applications and libraries are not written in a consistent
OO style -- unlike Java, Python encourages defining functions at the
top-level of a module, and for simple data structures, tuples (or named
tuples or lists) and dictionaries are often used exclusively or mixed
with classes or data classes.

Pattern matching is particularly suitable for picking apart such data
structures. As an extreme example, it's easy to write code that picks a
JSON data structure using match:

    match json_pet:
        case {"type": "cat", "name": name, "pattern": pattern}:
            return Cat(name, pattern)
        case {"type": "dog", "name": name, "breed": breed}:
            return Dog(name, breed)
        case _:
            raise ValueError("Not a suitable pet")

Functional programming generally prefers a declarative style with a
focus on relationships in data. Side effects are avoided whenever
possible. Pattern matching thus naturally fits and highly supports
functional programming style.

Rationale

This section provides the rationale for individual design decisions. It
takes the place of "Rejected ideas" in the standard PEP format. It is
organized in sections corresponding to the specification (PEP 634).

Overview and Terminology

Much of the power of pattern matching comes from the nesting of
subpatterns. That the success of a pattern match depends directly on the
success of subpattern is thus a cornerstone of the design. However,
although a pattern like P(Q(), R()) succeeds only if both subpatterns
Q() and R() succeed (i.e. the success of pattern P depends on Q and R),
the pattern P is checked first. If P fails, neither Q() nor R() will be
tried (this is a direct consequence of the fact that if P fails, there
are no subjects to match against Q() and R() in the first place).

Also note that patterns bind names to values rather than performing an
assignment. This reflects the fact that patterns aim to not have side
effects, which also means that Capture or AS patterns cannot assign a
value to an attribute or subscript. We thus consistently use the term
'bind' instead of 'assign' to emphasise this subtle difference between
traditional assignments and name binding in patterns.

The Match Statement

The match statement evaluates an expression to produce a subject, finds
the first pattern that matches the subject, and executes the associated
block of code. Syntactically, the match statement thus takes an
expression and a sequence of case clauses, where each case clause
comprises a pattern and a block of code.

Since case clauses comprise a block of code, they adhere to the existing
indentation scheme with the syntactic structure of
<keyword> ...: <(indented) block>, which resembles a compound statement.
The keyword case reflects its widespread use in pattern matching
languages, ignoring those languages that use other syntactic means such
as a symbol like |, because it would not fit established Python
structures. The syntax of patterns following the keyword is discussed
below.

Given that the case clauses follow the structure of a compound
statement, the match statement itself naturally becomes a compound
statement itself as well, following the same syntactic structure. This
naturally leads to match <expr>: <case_clause>+. Note that the match
statement determines a quasi-scope in which the evaluated subject is
kept alive (although not in a local variable), similar to how a with
statement might keep a resource alive during execution of its block.
Furthermore, control flows from the match statement to a case clause and
then leaves the block of the match statement. The block of the match
statement thus has both syntactic and semantic meaning.

Various suggestions have sought to eliminate or avoid the naturally
arising "double indentation" of a case clause's code block.
Unfortunately, all such proposals of flat indentation schemes come at
the expense of violating Python's established structural paradigm,
leading to additional syntactic rules:

-   Unindented case clauses. The idea is to align case clauses with the
    match, i.e.:

        match expression:
        case pattern_1:
            ...
        case pattern_2:
            ...

    This may look awkward to the eye of a Python programmer, because
    everywhere else a colon is followed by an indent. The match would
    neither follow the syntactic scheme of simple nor composite
    statements but rather establish a category of its own.

-   Putting the expression on a separate line after "match". The idea is
    to use the expression yielding the subject as a statement to avoid
    the singularity of match having no actual block despite the colons:

        match:
            expression
        case pattern_1:
            ...
        case pattern_2:
            ...

    This was ultimately rejected because the first block would be
    another novelty in Python's grammar: a block whose only content is a
    single expression rather than a sequence of statements. Attempts to
    amend this issue by adding or repurposing yet another keyword along
    the lines of match: return expression did not yield any satisfactory
    solution.

Although flat indentation would save some horizontal space, the cost of
increased complexity or unusual rules is too high. It would also
complicate life for simple-minded code editors. Finally, the horizontal
space issue can be alleviated by allowing "half-indent" (i.e. two spaces
instead of four) for match statements (though we do not recommend this).

In sample programs using match, written as part of the development of
this PEP, a noticeable improvement in code brevity is observed, more
than making up for the additional indentation level.

Statement vs. Expression. Some suggestions centered around the idea of
making match an expression rather than a statement. However, this would
fit poorly with Python's statement-oriented nature and lead to unusually
long and complex expressions and the need to invent new syntactic
constructs or break well established syntactic rules. An obvious
consequence of match as an expression would be that case clauses could
no longer have arbitrary blocks of code attached, but only a single
expression. Overall, the strong limitations could in no way offset the
slight simplification in some special use cases.

Hard vs. Soft Keyword. There were options to make match a hard keyword,
or choose a different keyword. Although using a hard keyword would
simplify life for simple-minded syntax highlighters, we decided not to
use hard keyword for several reasons:

-   Most importantly, the new parser doesn't require us to do this.
    Unlike with async that caused hardships with being a soft keyword
    for few releases, here we can make match a permanent soft keyword.
-   match is so commonly used in existing code, that it would break
    almost every existing program and will put a burden to fix code on
    many people who may not even benefit from the new syntax.
-   It is hard to find an alternative keyword that would not be commonly
    used in existing programs as an identifier, and would still clearly
    reflect the meaning of the statement.

Use "as" or "|" instead of "case" for case clauses. The pattern matching
proposed here is a combination of multi-branch control flow (in line
with switch in Algol-derived languages or cond in Lisp) and
object-deconstruction as found in functional languages. While the
proposed keyword case highlights the multi-branch aspect, alternative
keywords such as as would equally be possible, highlighting the
deconstruction aspect. as or with, for instance, also have the advantage
of already being keywords in Python. However, since case as a keyword
can only occur as a leading keyword inside a match statement, it is easy
for a parser to distinguish between its use as a keyword or as a
variable.

Other variants would use a symbol like | or =>, or go entirely without
special marker.

Since Python is a statement-oriented language in the tradition of Algol,
and as each composite statement starts with an identifying keyword, case
seemed to be most in line with Python's style and traditions.

Match Semantics

The patterns of different case clauses might overlap in that more than
one case clause would match a given subject. The first-to-match rule
ensures that the selection of a case clause for a given subject is
unambiguous. Furthermore, case clauses can have increasingly general
patterns matching wider sets of subjects. The first-to-match rule then
ensures that the most precise pattern can be chosen (although it is the
programmer's responsibility to order the case clauses correctly).

In a statically typed language, the match statement would be compiled to
a decision tree to select a matching pattern quickly and very
efficiently. This would, however, require that all patterns be purely
declarative and static, running against the established dynamic
semantics of Python. The proposed semantics thus represent a path
incorporating the best of both worlds: patterns are tried in a strictly
sequential order so that each case clause constitutes an actual
statement. At the same time, we allow the interpreter to cache any
information about the subject or change the order in which subpatterns
are tried. In other words: if the interpreter has found that the subject
is not an instance of a class C, it can directly skip case clauses
testing for this again, without having to perform repeated
instance-checks. If a guard stipulates that a variable x must be
positive, say (i.e. if x > 0), the interpreter might check this directly
after binding x and before any further subpatterns are considered.

Binding and scoping. In many pattern matching implementations, each case
clause would establish a separate scope of its own. Variables bound by a
pattern would then only be visible inside the corresponding case block.
In Python, however, this does not make sense. Establishing separate
scopes would essentially mean that each case clause is a separate
function without direct access to the variables in the surrounding scope
(without having to resort to nonlocal that is). Moreover, a case clause
could no longer influence any surrounding control flow through standard
statement such as return or break. Hence, such strict scoping would lead
to unintuitive and surprising behavior.

A direct consequence of this is that any variable bindings outlive the
respective case or match statements. Even patterns that only match a
subject partially might bind local variables (this is, in fact,
necessary for guards to function properly). However, these semantics for
variable binding are in line with existing Python structures such as for
loops and with statements.

Guards

Some constraints cannot be adequately expressed through patterns alone.
For instance, a 'less' or 'greater than' relationship defies the usual
'equal' semantics of patterns. Moreover, different subpatterns are
independent and cannot refer to each other. The addition of guards
addresses these restrictions: a guard is an arbitrary expression
attached to a pattern and that must evaluate to a "truthy" value for the
pattern to succeed.

For example, case [x, y] if x < y: uses a guard (if x < y) to express a
'less than' relationship between two otherwise disjoint capture patterns
x and y.

From a conceptual point of view, patterns describe structural
constraints on the subject in a declarative style, ideally without any
side-effects. Recall, in particular, that patterns are clearly distinct
from expressions, following different objectives and semantics. Guards
then enhance case blocks in a highly controlled way with arbitrary
expressions (that might have side effects). Splitting the overall
functionality into a static structural and a dynamically evaluated part
not only helps with readability, but can also introduce dramatic
potential for compiler optimizations. To keep this clear separation,
guards are only supported on the level of case clauses and not for
individual patterns.

Example using guards:

    def sort(seq):
        match seq:
            case [] | [_]:
                return seq
            case [x, y] if x <= y:
                return seq
            case [x, y]:
                return [y, x]
            case [x, y, z] if x <= y <= z:
                return seq
            case [x, y, z] if x >= y >= z:
                return [z, y, x]
            case [p, *rest]:
                a = sort([x for x in rest if x <= p])
                b = sort([x for x in rest if p < x])
                return a + [p] + b

Patterns

Patterns fulfill two purposes: they impose (structural) constraints on
the subject and they specify which data values should be extracted from
the subject and bound to variables. In iterable unpacking, which can be
seen as a prototype to pattern matching in Python, there is only one
structural pattern to express sequences while there is a rich set of
binding patterns to assign a value to a specific variable or field. Full
pattern matching differs from this in that there is more variety in
structural patterns but only a minimum of binding patterns.

Patterns differ from assignment targets (as in iterable unpacking) in
two ways: they impose additional constraints on the structure of the
subject, and a subject may safely fail to match a specific pattern at
any point (in iterable unpacking, this constitutes an error). The latter
means that pattern should avoid side effects wherever possible.

This desire to avoid side effects is one reason why capture patterns
don't allow binding values to attributes or subscripts: if the
containing pattern were to fail in a later step, it would be hard to
revert such bindings.

A cornerstone of pattern matching is the possibility of arbitrarily
nesting patterns. The nesting allows expressing deep tree structures
(for an example of nested class patterns, see the motivation section
above) as well as alternatives.

Although patterns might superficially look like expressions, it is
important to keep in mind that there is a clear distinction. In fact, no
pattern is or contains an expression. It is more productive to think of
patterns as declarative elements similar to the formal parameters in a
function definition.

AS Patterns

Patterns fall into two categories: most patterns impose a (structural)
constraint that the subject needs to fulfill, whereas the capture
pattern binds the subject to a name without regard for the subject's
structure or actual value. Consequently, a pattern can either express a
constraint or bind a value, but not both. AS patterns fill this gap in
that they allow the user to specify a general pattern as well as capture
the subject in a variable.

Typical use cases for the AS pattern include OR and Class patterns
together with a binding name as in, e.g.,
case BinOp('+'|'-' as op, ...): or
case [int() as first, int() as second]:. The latter could be understood
as saying that the subject must fulfil two distinct pattern:
[first, second] as well as [int(), int()]. The AS pattern can thus be
seen as a special case of an 'and' pattern (see OR patterns below for an
additional discussion of 'and' patterns).

In an earlier version, the AS pattern was devised as a 'Walrus pattern',
written as case [first:=int(), second:=int()]. However, using as offers
some advantages over :=:

-   The walrus operator := is used to capture the result of an
    expression on the right hand side, whereas as generally indicates
    some form of 'processing' as in import foo as bar or
    except E as err:. Indeed, the pattern P as x does not assign the
    pattern P to x, but rather the subject that successfully matches P.
-   as allows for a more consistent data flow from left to right (the
    attributes in Class patterns also follow a left-to-right data flow).
-   The walrus operator looks very similar to the syntax for matching
    attributes in the Class pattern, potentially leading to some
    confusion.

Example using the AS pattern:

    def simplify_expr(tokens):
        match tokens:
            case [('('|'[') as l, *expr, (')'|']') as r] if (l+r) in ('()', '[]'):
                return simplify_expr(expr)
            case [0, ('+'|'-') as op, right]:
                return UnaryOp(op, right)
            case [(int() | float() as left) | Num(left), '+', (int() | float() as right) | Num(right)]:
                return Num(left + right)
            case [(int() | float()) as value]:
                return Num(value)

OR Patterns

The OR pattern allows you to combine 'structurally equivalent'
alternatives into a new pattern, i.e. several patterns can share a
common handler. If any of an OR pattern's subpatterns matches the
subject, the entire OR pattern succeeds.

Statically typed languages prohibit the binding of names (capture
patterns) inside an OR pattern because of potential conflicts concerning
the types of variables. As a dynamically typed language, Python can be
less restrictive here and allow capture patterns inside OR patterns.
However, each subpattern must bind the same set of variables so as not
to leave potentially undefined names. With two alternatives P | Q, this
means that if P binds the variables u and v, Q must bind exactly the
same variables u and v.

There was some discussion on whether to use the bar symbol | or the or
keyword to separate alternatives. The OR pattern does not fully fit the
existing semantics and usage of either of these two symbols. However, |
is the symbol of choice in all programming languages with support of the
OR pattern and is used in that capacity for regular expressions in
Python as well. It is also the traditional separator between
alternatives in formal grammars (including Python's). Moreover, | is not
only used for bitwise OR, but also for set unions and dict merging (PEP
584).

Other alternatives were considered as well, but none of these would
allow OR-patterns to be nested inside other patterns:

-   Using a comma:

        case 401, 403, 404:
            print("Some HTTP error")

    This looks too much like a tuple -- we would have to find a
    different way to spell tuples, and the construct would have to be
    parenthesized inside the argument list of a class pattern. In
    general, commas already have many different meanings in Python, we
    shouldn't add more.

-   Using stacked cases:

        case 401:
        case 403:
        case 404:
            print("Some HTTP error")

    This is how this would be done in C, using its fall-through
    semantics for cases. However, we don't want to mislead people into
    thinking that match/case uses fall-through semantics (which are a
    common source of bugs in C). Also, this would be a novel indentation
    pattern, which might make it harder to support in IDEs and such (it
    would break the simple rule "add an indentation level after a line
    ending in a colon"). Finally, this would not support OR patterns
    nested inside other patterns, either.

-   Using "case in" followed by a comma-separated list:

        case in 401, 403, 404:
            print("Some HTTP error")

    This would not work for OR patterns nested inside other patterns,
    like:

        case Point(0|1, 0|1):
            print("A corner of the unit square")

AND and NOT Patterns

Since this proposal defines an OR-pattern (|) to match one of several
alternates, why not also an AND-pattern (&) or even a NOT-pattern (!)?
Especially given that some other languages (F# for example) support
AND-patterns.

However, it is not clear how useful this would be. The semantics for
matching dictionaries, objects and sequences already incorporates an
implicit 'and': all attributes and elements mentioned must be present
for the match to succeed. Guard conditions can also support many of the
use cases that a hypothetical 'and' operator would be used for.

A negation of a match pattern using the operator ! as a prefix would
match exactly if the pattern itself does not match. For instance,
!(3 | 4) would match anything except 3 or 4. However, there is evidence
from other languages that this is rarely useful, and primarily used as
double negation !! to control variable scopes and prevent variable
bindings (which does not apply to Python). Other use cases are better
expressed using guards.

In the end, it was decided that this would make the syntax more complex
without adding a significant benefit. It can always be added later.

Example using the OR pattern:

    def simplify(expr):
        match expr:
            case ('/', 0, 0):
                return expr
            case ('*'|'/', 0, _):
                return 0
            case ('+'|'-', x, 0) | ('+', 0, x) | ('*', 1, x) | ('*'|'/', x, 1):
                return x
        return expr

Literal Patterns

Literal patterns are a convenient way for imposing constraints on the
value of a subject, rather than its type or structure. They also allow
you to emulate a switch statement using pattern matching.

Generally, the subject is compared to a literal pattern by means of
standard equality (x == y in Python syntax). Consequently, the literal
patterns 1.0 and 1 match exactly the same set of objects, i.e. case 1.0:
and case 1: are fully interchangeable. In principle, True would also
match the same set of objects because True == 1 holds. However, we
believe that many users would be surprised finding that case True:
matched the subject 1.0, resulting in some subtle bugs and convoluted
workarounds. We therefore adopted the rule that the three singleton
patterns None, False and True match by identity (x is y in Python
syntax) rather than equality. Hence, case True: will match only True and
nothing else. Note that case 1: would still match True, though, because
the literal pattern 1 works by equality and not identity.

Early ideas to induce a hierarchy on numbers so that case 1.0 would
match both the integer 1 and the floating point number 1.0, whereas
case 1: would only match the integer 1 were eventually dropped in favor
of the simpler and more consistent rule based on equality. Moreover, any
additional checks whether the subject is an instance of numbers.Integral
would come at a high runtime cost to introduce what would essentially be
a novel idea in Python. When needed, the explicit syntax case int(1):
can be used.

Recall that literal patterns are not expressions, but directly denote a
specific value. From a pragmatic point of view, we want to allow using
negative and even complex values as literal patterns, but they are not
atomic literals (only unsigned real and imaginary numbers are). E.g.,
-3+4j is syntactically an expression of the form
BinOp(UnaryOp('-', 3), '+', 4j). Since expressions are not part of
patterns, we had to add explicit syntactic support for such values
without having to resort to full expressions.

Interpolated f-strings, on the other hand, are not literal values,
despite their appearance and can therefore not be used as literal
patterns (string concatenation, however, is supported).

Literal patterns not only occur as patterns in their own right, but also
as keys in mapping patterns.

Range matching patterns. This would allow patterns such as 1...6.
However, there are a host of ambiguities:

-   Is the range open, half-open, or closed? (I.e. is 6 included in the
    above example or not?)
-   Does the range match a single number, or a range object?
-   Range matching is often used for character ranges ('a'...'z') but
    that won't work in Python since there's no character data type, just
    strings.
-   Range matching can be a significant performance optimization if you
    can pre-build a jump table, but that's not generally possible in
    Python due to the fact that names can be dynamically rebound.

Rather than creating a special-case syntax for ranges, it was decided
that allowing custom pattern objects (InRange(0, 6)) would be more
flexible and less ambiguous; however those ideas have been postponed for
the time being.

Example using Literal patterns:

    def simplify(expr):
        match expr:
            case ('+', 0, x):
                return x
            case ('+' | '-', x, 0):
                return x
            case ('and', True, x):
                return x
            case ('and', False, x):
                return False
            case ('or', False, x):
                return x
            case ('or', True, x):
                return True
            case ('not', ('not', x)):
                return x
        return expr

Capture Patterns

Capture patterns take on the form of a name that accepts any value and
binds it to a (local) variable (unless the name is declared as nonlocal
or global). In that sense, a capture pattern is similar to a parameter
in a function definition (when the function is called, each parameter
binds the respective argument to a local variable in the function's
scope).

A name used for a capture pattern must not coincide with another capture
pattern in the same pattern. This, again, is similar to parameters,
which equally require each parameter name to be unique within the list
of parameters. It differs, however, from iterable unpacking assignment,
where the repeated use of a variable name as target is permissible
(e.g., x, x = 1, 2). The rationale for not supporting (x, x) in patterns
is its ambiguous reading: it could be seen as in iterable unpacking
where only the second binding to x survives. But it could be equally
seen as expressing a tuple with two equal elements (which comes with its
own issues). Should the need arise, then it is still possible to
introduce support for repeated use of names later on.

There were calls to explicitly mark capture patterns and thus identify
them as binding targets. According to that idea, a capture pattern would
be written as, e.g. ?x, $x or =x. The aim of such explicit capture
markers is to let an unmarked name be a value pattern (see below).
However, this is based on the misconception that pattern matching was an
extension of switch statements, placing the emphasis on fast switching
based on (ordinal) values. Such a switch statement has indeed been
proposed for Python before (see PEP 275 and PEP 3103). Pattern matching,
on the other hand, builds a generalized concept of iterable unpacking.
Binding values extracted from a data structure is at the very core of
the concept and hence the most common use case. Explicit markers for
capture patterns would thus betray the objective of the proposed pattern
matching syntax and simplify a secondary use case at the expense of
additional syntactic clutter for core cases.

It has been proposed that capture patterns are not needed at all, since
the equivalent effect can be obtained by combining an AS pattern with a
wildcard pattern (e.g., case _ as x is equivalent to case x). However,
this would be unpleasantly verbose, especially given that we expect
capture patterns to be very common.

Example using Capture patterns:

    def average(*args):
        match args:
            case [x, y]:           # captures the two elements of a sequence
                return (x + y) / 2
            case [x]:              # captures the only element of a sequence
                return x
            case []:
                return 0
            case a:                # captures the entire sequence
                return sum(a) / len(a)

Wildcard Pattern

The wildcard pattern is a special case of a 'capture' pattern: it
accepts any value, but does not bind it to a variable. The idea behind
this rule is to support repeated use of the wildcard in patterns. While
(x, x) is an error, (_, _) is legal.

Particularly in larger (sequence) patterns, it is important to allow the
pattern to concentrate on values with actual significance while ignoring
anything else. Without a wildcard, it would become necessary to 'invent'
a number of local variables, which would be bound but never used. Even
when sticking to naming conventions and using e.g. _1, _2, _3 to name
irrelevant values, say, this still introduces visual clutter and can
hurt performance (compare the sequence pattern (x, y, *z) to (_, y, *_),
where the *z forces the interpreter to copy a potentially very long
sequence, whereas the second version simply compiles to code along the
lines of y = seq[1]).

There has been much discussion about the choice of the underscore as _
as a wildcard pattern, i.e. making this one name non-binding. However,
the underscore is already heavily used as an 'ignore value' marker in
iterable unpacking. Since the wildcard pattern _ never binds, this use
of the underscore does not interfere with other uses such as inside the
REPL or the gettext module.

It has been proposed to use ... (i.e., the ellipsis token) or * (star)
as a wildcard. However, both these look as if an arbitrary number of
items is omitted:

    case [a, ..., z]: ...
    case [a, *, z]: ...

Either example looks like it would match a sequence of two or more
items, capturing the first and last values. While that may be the
ultimate "wildcard", it does not convey the desired semantics.

An alternative that does not suggest an arbitrary number of items would
be ?. This is even being proposed independently from pattern matching in
PEP 640. We feel however that using ? as a special "assignment" target
is likely more confusing to Python users than using _. It violates
Python's (admittedly vague) principle of using punctuation characters
only in ways similar to how they are used in common English usage or in
high school math, unless the usage is very well established in other
programming languages (like, e.g., using a dot for member access).

The question mark fails on both counts: its use in other programming
languages is a grab-bag of usages only vaguely suggested by the idea of
a "question". For example, it means "any character" in shell globbing,
"maybe" in regular expressions, "conditional expression" in C and many
C-derived languages, "predicate function" in Scheme, "modify error
handling" in Rust, "optional argument" and "optional chaining" in
TypeScript (the latter meaning has also been proposed for Python by PEP
505). An as yet unnamed PEP proposes it to mark optional types, e.g.
int?.

Another common use of ? in programming systems is "help", for example,
in IPython and Jupyter Notebooks and many interactive command-line
utilities.

In addition, this would put Python in a rather unique position: The
underscore is as a wildcard pattern in every programming language with
pattern matching that we could find (including C#, Elixir, Erlang, F#,
Grace, Haskell, Mathematica, OCaml, Ruby, Rust, Scala, Swift, and
Thorn). Keeping in mind that many users of Python also work with other
programming languages, have prior experience when learning Python, and
may move on to other languages after having learned Python, we find that
such well-established standards are important and relevant with respect
to readability and learnability. In our view, concerns that this
wildcard means that a regular name received special treatment are not
strong enough to introduce syntax that would make Python special.

Else blocks. A case block without a guard whose pattern is a single
wildcard (i.e., case _:) accepts any subject without binding it to a
variable or performing any other operation. It is thus semantically
equivalent to else:, if it were supported. However, adding such an else
block to the match statement syntax would not remove the need for the
wildcard pattern in other contexts. Another argument against this is
that there would be two plausible indentation levels for an else block:
aligned with case or aligned with match. The authors have found it quite
contentious which indentation level to prefer.

Example using the Wildcard pattern:

    def is_closed(sequence):
        match sequence:
            case [_]:               # any sequence with a single element
                return True
            case [start, *_, end]:  # a sequence with at least two elements
                return start == end
            case _:                 # anything
                return False

Value Patterns

It is good programming style to use named constants for parametric
values or to clarify the meaning of particular values. Clearly, it would
be preferable to write case (HttpStatus.OK, body): over
case (200, body):, for example. The main issue that arises here is how
to distinguish capture patterns (variable bindings) from value patterns.
The general discussion surrounding this issue has brought forward a
plethora of options, which we cannot all fully list here.

Strictly speaking, value patterns are not really necessary, but could be
implemented using guards, i.e.
case (status, body) if status == HttpStatus.OK:. Nonetheless, the
convenience of value patterns is unquestioned and obvious.

The observation that constants tend to be written in uppercase letters
or collected in enumeration-like namespaces suggests possible rules to
discern constants syntactically. However, the idea of using upper- vs.
lowercase as a marker has been met with scepticism since there is no
similar precedence in core Python (although it is common in other
languages). We therefore only adopted the rule that any dotted name
(i.e., attribute access) is to be interpreted as a value pattern, for
example HttpStatus.OK above. This precludes, in particular, local
variables and global variables defined in the current module from acting
as constants.

A proposed rule to use a leading dot (e.g. .CONSTANT) for that purpose
was criticised because it was felt that the dot would not be a
visible-enough marker for that purpose. Partly inspired by forms found
in other programming languages, a number of different markers/sigils
were proposed (such as ^CONSTANT, $CONSTANT, ==CONSTANT, CONSTANT?, or
the word enclosed in backticks), although there was no obvious or
natural choice. The current proposal therefore leaves the discussion and
possible introduction of such a 'constant' marker for a future PEP.

Distinguishing the semantics of names based on whether it is a global
variable (i.e. the compiler would treat global variables as constants
rather than capture patterns) leads to various issues. The addition or
alteration of a global variable in the module could have unintended side
effects on patterns. Moreover, pattern matching could not be used
directly inside a module's scope because all variables would be global,
making capture patterns impossible.

Example using the Value pattern:

    def handle_reply(reply):
        match reply:
            case (HttpStatus.OK, MimeType.TEXT, body):
                process_text(body)
            case (HttpStatus.OK, MimeType.APPL_ZIP, body):
                text = deflate(body)
                process_text(text)
            case (HttpStatus.MOVED_PERMANENTLY, new_URI):
                resend_request(new_URI)
            case (HttpStatus.NOT_FOUND):
                raise ResourceNotFound()

Group Patterns

Allowing users to explicitly specify the grouping is particularly
helpful in case of OR patterns.

Sequence Patterns

Sequence patterns follow as closely as possible the already established
syntax and semantics of iterable unpacking. Of course, subpatterns take
the place of assignment targets (variables, attributes and subscript).
Moreover, the sequence pattern only matches a carefully selected set of
possible subjects, whereas iterable unpacking can be applied to any
iterable.

-   As in iterable unpacking, we do not distinguish between 'tuple' and
    'list' notation. [a, b, c], (a, b, c) and a, b, c are all
    equivalent. While this means we have a redundant notation and
    checking specifically for lists or tuples requires more effort (e.g.
    case list([a, b, c])), we mimic iterable unpacking as much as
    possible.
-   A starred pattern will capture a sub-sequence of arbitrary length,
    again mirroring iterable unpacking. Only one starred item may be
    present in any sequence pattern. In theory, patterns such as
    (*_, 3, *_) could be understood as expressing any sequence
    containing the value 3. In practice, however, this would only work
    for a very narrow set of use cases and lead to inefficient
    backtracking or even ambiguities otherwise.
-   The sequence pattern does not iterate through an iterable subject.
    All elements are accessed through subscripting and slicing, and the
    subject must be an instance of collections.abc.Sequence. This
    includes, of course, lists and tuples, but excludes e.g. sets and
    dictionaries. While it would include strings and bytes, we make an
    exception for these (see below).

A sequence pattern cannot just iterate through any iterable object. The
consumption of elements from the iteration would have to be undone if
the overall pattern fails, which is not feasible.

To identify sequences we cannot rely on len() and subscripting and
slicing alone, because sequences share these protocols with mappings
(e.g. dict) in this regard. It would be surprising if a sequence pattern
also matched a dictionaries or other objects implementing the mapping
protocol (i.e. __getitem__). The interpreter therefore performs an
instance check to ensure that the subject in question really is a
sequence (of known type). (As an optimization of the most common case,
if the subject is exactly a list or a tuple, the instance check can be
skipped.)

String and bytes objects have a dual nature: they are both 'atomic'
objects in their own right, as well as sequences (with a strongly
recursive nature in that a string is a sequence of strings). The typical
behavior and use cases for strings and bytes are different enough from
those of tuples and lists to warrant a clear distinction. It is in fact
often unintuitive and unintended that strings pass for sequences, as
evidenced by regular questions and complaints. Strings and bytes are
therefore not matched by a sequence pattern, limiting the sequence
pattern to a very specific understanding of 'sequence'. The built-in
bytearray type, being a mutable version of bytes, also deserves an
exception; but we don't intend to enumerate all other types that may be
used to represent bytes (e.g. some, but not all, instances of memoryview
and array.array).

Mapping Patterns

Dictionaries or mappings in general are one of the most important and
most widely used data structures in Python. In contrast to sequences,
mappings are built for fast direct access to arbitrary elements
identified by a key. In most cases an element is retrieved from a
dictionary by a known key without regard for any ordering or other
key-value pairs stored in the same dictionary. Particularly common are
string keys.

The mapping pattern reflects the common usage of dictionary lookup: it
allows the user to extract some values from a mapping by means of
constant/known keys and have the values match given subpatterns. Extra
keys in the subject are ignored even if **rest is not present. This is
different from sequence patterns, where extra items will cause a match
to fail. But mappings are actually different from sequences: they have
natural structural sub-typing behavior, i.e., passing a dictionary with
extra keys somewhere will likely just work. Should it be necessary to
impose an upper bound on the mapping and ensure that no additional keys
are present, then the usual double-star-pattern **rest can be used. The
special case **_ with a wildcard, however, is not supported as it would
not have any effect, but might lead to an incorrect understanding of the
mapping pattern's semantics.

To avoid overly expensive matching algorithms, keys must be literals or
value patterns.

There is a subtle reason for using get(key, default) instead of
__getitem__(key) followed by a check for AttributeError: if the subject
happens to be a defaultdict, calling __getitem__ for a non-existent key
would add the key. Using get() avoids this unexpected side effect.

Example using the Mapping pattern:

    def change_red_to_blue(json_obj):
        match json_obj:
            case { 'color': ('red' | '#FF0000') }:
                json_obj['color'] = 'blue'
            case { 'children': children }:
                for child in children:
                    change_red_to_blue(child)

Class Patterns

Class patterns fulfill two purposes: checking whether a given subject is
indeed an instance of a specific class, and extracting data from
specific attributes of the subject. Anecdotal evidence revealed that
isinstance() is one of the most often used functions in Python in terms
of static occurrences in programs. Such instance checks typically
precede a subsequent access to information stored in the object, or a
possible manipulation thereof. A typical pattern might be along the
lines of:

    def traverse_tree(node):
        if isinstance(node, Node):
            traverse_tree(node.left)
            traverse_tree(node.right)
        elif isinstance(node, Leaf):
            print(node.value)

In many cases class patterns occur nested, as in the example given in
the motivation:

    if (isinstance(node, BinOp) and node.op == "+"
            and isinstance(node.right, BinOp) and node.right.op == "*"):
        a, b, c = node.left, node.right.left, node.right.right
        # Handle a + b*c

The class pattern lets you concisely specify both an instance check and
relevant attributes (with possible further constraints). It is thereby
very tempting to write, e.g., case Node(left, right): in the first case
above and case Leaf(value): in the second. While this indeed works well
for languages with strict algebraic data types, it is problematic with
the structure of Python objects.

When dealing with general Python objects, we face a potentially very
large number of unordered attributes: an instance of Node contains a
large number of attributes (most of which are 'special methods' such as
__repr__). Moreover, the interpreter cannot reliably deduce the ordering
of attributes. For an object that represents a circle, say, there is no
inherently obvious ordering of the attributes x, y and radius.

We envision two possibilities for dealing with this issue: either
explicitly name the attributes of interest, or provide an additional
mapping that tells the interpreter which attributes to extract and in
which order. Both approaches are supported. Moreover, explicitly naming
the attributes of interest lets you further specify the required
structure of an object; if an object lacks an attribute specified by the
pattern, the match fails.

-   Attributes that are explicitly named pick up the syntax of named
    arguments. If an object of class Node has two attributes left and
    right as above, the pattern Node(left=x, right=y) will extract the
    values of both attributes and assign them to x and y, respectively.
    The data flow from left to right seems unusual, but is in line with
    mapping patterns and has precedents such as assignments via as in
    with- or import-statements (and indeed AS patterns).

    Naming the attributes in question explicitly will be mostly used for
    more complex cases where the positional form (below) is
    insufficient.

-   The class field __match_args__ specifies a number of attributes
    together with their ordering, allowing class patterns to rely on
    positional sub-patterns without having to explicitly name the
    attributes in question. This is particularly handy for smaller
    objects or instances of data classes, where the attributes of
    interest are rather obvious and often have a well-defined ordering.
    In a way, __match_args__ is similar to the declaration of formal
    parameters, which allows calling functions with positional arguments
    rather than naming all the parameters.

    This is a class attribute, because it needs to be looked up on the
    class named in the class pattern, not on the subject instance.

The syntax of class patterns is based on the idea that de-construction
mirrors the syntax of construction. This is already the case in
virtually any Python construct, be assignment targets, function
definitions or iterable unpacking. In all these cases, we find that the
syntax for sending and that for receiving 'data' are virtually
identical.

-   Assignment targets such as variables, attributes and subscripts:
    foo.bar[2] = foo.bar[3];
-   Function definitions: a function defined with def foo(x, y, z=6) is
    called as, e.g., foo(123, y=45), where the actual arguments provided
    at the call site are matched against the formal parameters at the
    definition site;
-   Iterable unpacking: a, b = b, a or [a, b] = [b, a] or
    (a, b) = (b, a), just to name a few equivalent possibilities.

Using the same syntax for reading and writing, l- and r-values, or
construction and de-construction is widely accepted for its benefits in
thinking about data, its flow and manipulation. This equally extends to
the explicit construction of instances, where class patterns C(p, q)
deliberately mirror the syntax of creating instances.

The special case for the built-in classes bool, bytearray etc. (where
e.g. str(x) captures the subject value in x) can be emulated by a
user-defined class as follows:

    class MyClass:
        __match_args__ = ["__myself__"]
        __myself__ = property(lambda self: self)

Type annotations for pattern variables. The proposal was to combine
patterns with type annotations:

    match x:
        case [a: int, b: str]: print(f"An int {a} and a string {b}:")
        case [a: int, b: int, c: int]: print("Three ints", a, b, c)
        ...

This idea has a lot of problems. For one, the colon can only be used
inside of brackets or parentheses, otherwise the syntax becomes
ambiguous. And because Python disallows isinstance() checks on generic
types, type annotations containing generics will not work as expected.

History and Context

Pattern matching emerged in the late 1970s in the form of tuple
unpacking and as a means to handle recursive data structures such as
linked lists or trees (object-oriented languages usually use the visitor
pattern for handling recursive data structures). The early proponents of
pattern matching organised structured data in 'tagged tuples' rather
than struct as in C or the objects introduced later. A node in a binary
tree would, for instance, be a tuple with two elements for the left and
right branches, respectively, and a Node tag, written as
Node(left, right). In Python we would probably put the tag inside the
tuple as ('Node', left, right) or define a data class Node to achieve
the same effect.

Using modern syntax, a depth-first tree traversal would then be written
as follows:

    def traverse(node):
        match node:
            case Node(left, right):
                traverse(left)
                traverse(right)
            case Leaf(value):
                handle(value)

The notion of handling recursive data structures with pattern matching
immediately gave rise to the idea of handling more general recursive
'patterns' (i.e. recursion beyond recursive data structures) with
pattern matching. Pattern matching would thus also be used to define
recursive functions such as:

    def fib(arg):
        match arg:
            case 0:
                return 1
            case 1:
                return 1
            case n:
                return fib(n-1) + fib(n-2)

As pattern matching was repeatedly integrated into new and emerging
programming languages, its syntax slightly evolved and expanded. The two
first cases in the fib example above could be written more succinctly as
case 0 | 1: with | denoting alternative patterns. Moreover, the
underscore _ was widely adopted as a wildcard, a filler where neither
the structure nor value of parts of a pattern were of substance. Since
the underscore is already frequently used in equivalent capacity in
Python's iterable unpacking (e.g., _, _, third, _* = something) we kept
these universal standards.

It is noteworthy that the concept of pattern matching has always been
closely linked to the concept of functions. The different case clauses
have always been considered as something like semi-independent functions
where pattern variables take on the role of parameters. This becomes
most apparent when pattern matching is written as an overloaded
function, along the lines of (Standard ML):

    fun fib 0 = 1
      | fib 1 = 1
      | fib n = fib (n-1) + fib (n-2)

Even though such a strict separation of case clauses into independent
functions does not apply in Python, we find that patterns share many
syntactic rules with parameters, such as binding arguments to
unqualified names only or that variable/parameter names must not be
repeated for a particular pattern/function.

With its emphasis on abstraction and encapsulation, object-oriented
programming posed a serious challenge to pattern matching. In short: in
object-oriented programming, we can no longer view objects as tagged
tuples. The arguments passed into the constructor do not necessarily
specify the attributes or fields of the objects. Moreover, there is no
longer a strict ordering of an object's fields and some of the fields
might be private and thus inaccessible. And on top of this, the given
object might actually be an instance of a subclass with slightly
different structure.

To address this challenge, patterns became increasingly independent of
the original tuple constructors. In a pattern like Node(left, right),
Node is no longer a passive tag, but rather a function that can actively
check for any given object whether it has the right structure and
extract a left and right field. In other words: the Node-tag becomes a
function that transforms an object into a tuple or returns some failure
indicator if it is not possible.

In Python, we simply use isinstance() together with the __match_args__
field of a class to check whether an object has the correct structure
and then transform some of its attributes into a tuple. For the Node
example above, for instance, we would have
__match_args__ = ('left', 'right') to indicate that these two attributes
should be extracted to form the tuple. That is, case Node(x, y) would
first check whether a given object is an instance of Node and then
assign left to x and right to y, respectively.

Paying tribute to Python's dynamic nature with 'duck typing', however,
we also added a more direct way to specify the presence of, or
constraints on specific attributes. Instead of Node(x, y) you could also
write object(left=x, right=y), effectively eliminating the isinstance()
check and thus supporting any object with left and right attributes. Or
you would combine these ideas to write Node(right=y) so as to require an
instance of Node but only extract the value of the right attribute.

Backwards Compatibility

Through its use of "soft keywords" and the new PEG parser (PEP 617), the
proposal remains fully backwards compatible. However, 3rd party tooling
that uses a LL(1) parser to parse Python source code may be forced to
switch parser technology to be able to support those same features.

Security Implications

We do not expect any security implications from this language feature.

Reference Implementation

A feature-complete CPython implementation is available on GitHub.

An interactive playground based on the above implementation was created
using Binder[2] and Jupyter[3].

References

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

[1] Kohn et al., Dynamic Pattern Matching with Python
https://gvanrossum.github.io/docs/PyPatternMatching.pdf

[2] Binder https://mybinder.org

[3] Jupyter https://jupyter.org