PEP: 622 Title: Structural Pattern Matching Author: Brandt Bucher
<brandt@python.org>, Daniel F Moisset <dfmoisset@gmail.com>, Tobias Kohn
<kohnt@tobiaskohn.ch>, Ivan Levkivskyi <levkivskyi@gmail.com>, Guido van
Rossum <guido@python.org>, Talin <viridia@gmail.com> BDFL-Delegate:
Discussions-To: python-dev@python.org Status: Superseded Type: Standards
Track Content-Type: text/x-rst Created: 23-Jun-2020 Python-Version: 3.10
Post-History: 23-Jun-2020, 08-Jul-2020 Superseded-By: 634

Abstract

This PEP proposes to add a pattern matching statement to Python,
inspired by similar syntax found in Scala, Erlang, and other languages.

Patterns and shapes

The pattern syntax builds on Python’s existing syntax for sequence
unpacking (e.g., a, b = value).

A match statement compares a value (the subject) to several different
shapes (the patterns) until a shape fits. Each pattern describes the
type and structure of the accepted values as well as the variables where
to capture its contents.

Patterns can specify the shape to be:

-   a sequence to be unpacked, as already mentioned
-   a mapping with specific keys
-   an instance of a given class with (optionally) specific attributes
-   a specific value
-   a wildcard

Patterns can be composed in several ways.

Syntax

Syntactically, a match statement contains:

-   a subject expression
-   one or more case clauses

Each case clause specifies:

-   a pattern (the overall shape to be matched)
-   an optional “guard” (a condition to be checked if the pattern
    matches)
-   a code block to be executed if the case clause is selected

Motivation

The rest of the PEP:

-   motivates why we believe pattern matching makes a good addition to
    Python
-   explains our design choices
-   contains a precise syntactic and runtime specification
-   gives guidance for static type checkers (and one small addition to
    the typing module)
-   discusses the main objections and alternatives that have been
    brought up during extensive discussion of the proposal, both within
    the group of authors and in the python-dev community

Finally, we discuss some possible extensions that might be considered in
the future, once the community has ample experience with the currently
proposed syntax and semantics.

Overview

Patterns are a new syntactical category with their own rules and special
cases. Patterns mix input (given values) and output (captured variables)
in novel ways. They may take a little time to use effectively. The
authors have provided a brief introduction to the basic concepts here.
Note that this section is not intended to be complete or entirely
accurate.

Pattern, a new syntactic construct, and destructuring

A new syntactic construct called pattern is introduced in this PEP.
Syntactically, patterns look like a subset of expressions. The following
are examples of patterns:

-   [first, second, *rest]
-   Point2d(x, 0)
-   {"name": "Bruce", "age": age}
-   42

The above expressions may look like examples of object construction with
a constructor which takes some values as parameters and builds an object
from those components.

When viewed as a pattern, the above patterns mean the inverse operation
of construction, which we call destructuring. Destructuring takes a
subject value and extracts its components.

The syntactic similarity between object construction and destructuring
is intentional. It also follows the existing Pythonic style of contexts
which makes assignment targets (write contexts) look like expressions
(read contexts).

Pattern matching never creates objects. This is in the same way that
[a, b] = my_list doesn't create a new [a, b] list, nor reads the values
of a and b.

Matching process

During this matching process, the structure of the pattern may not fit
the subject, and matching fails.

For example, matching the pattern Point2d(x, 0) to the subject
Point2d(3, 0) successfully matches. The match also binds the pattern's
free variable x to the subject's value 3.

As another example, if the subject is [3, 0], the match fails because
the subject's type list is not the pattern's Point2d.

As a third example, if the subject is Point2d(3, 7), the match fails
because the subject's second coordinate 7 is not the same as the
pattern's 0.

The match statement tries to match a single subject to each of the
patterns in its case clauses. At the first successful match to a pattern
in a case clause:

-   the variables in the pattern are assigned, and
-   a corresponding block is executed.

Each case clause can also specify an optional boolean condition, known
as a guard.

Let's look at a more detailed example of a match statement. The match
statement is used within a function to define the building of 3D points.
In this example, the function can accept as input any of the following:
tuple with 2 elements, tuple with 3 elements, an existing Point2d object
or an existing Point3d object:

    def make_point_3d(pt):
        match pt:
            case (x, y):
                return Point3d(x, y, 0)
            case (x, y, z):
                return Point3d(x, y, z)
            case Point2d(x, y):
                return Point3d(x, y, 0)
            case Point3d(_, _, _):
                return pt
            case _:
                raise TypeError("not a point we support")

Without pattern matching, this function's implementation would require
several isinstance() checks, one or two len() calls, and a more
convoluted control flow. The match example version and the traditional
Python version without match translate into similar code under the hood.
With familiarity of pattern matching, a user reading this function using
match will likely find this version clearer than the traditional
approach.

Rationale and Goals

Python programs frequently need to handle data which varies in type,
presence of attributes/keys, or number of elements. Typical examples are
operating on nodes of a mixed structure like an AST, handling UI events
of different types, processing structured input (like structured files
or network messages), or “parsing” arguments for a function that can
accept different combinations of types and numbers of parameters. In
fact, the classic 'visitor' pattern is an example of this, done in an
OOP style -- but matching makes it much less tedious to write.

Much of the code to do so tends to consist of complex chains of nested
if/elif statements, including multiple calls to len(), isinstance() and
index/key/attribute access. Inside those branches users sometimes need
to destructure the data further to extract the required component
values, which may be nested several objects deep.

Pattern matching as present in many other languages provides an elegant
solution to this problem. These range from statically compiled
functional languages like F# and Haskell, via mixed-paradigm languages
like Scala and Rust, to dynamic languages like Elixir and Ruby, and is
under consideration for JavaScript. We are indebted to these languages
for guiding the way to Pythonic pattern matching, as Python is indebted
to so many other languages for many of its features: many basic
syntactic features were inherited from C, exceptions from Modula-3,
classes were inspired by C++, slicing came from Icon, regular
expressions from Perl, decorators resemble Java annotations, and so on.

The usual logic for operating on heterogeneous data can be summarized in
the following way:

-   Some analysis is done on the shape (type and components) of the
    data: This could involve isinstance() or len() calls and/or
    extracting components (via indexing or attribute access) which are
    checked for specific values or conditions.
-   If the shape is as expected, some more components are possibly
    extracted and some operation is done using the extracted values.

Take for example this piece of the Django web framework:

    if (
        isinstance(value, (list, tuple)) and
        len(value) > 1 and
        isinstance(value[-1], (Promise, str))
    ):
        *value, label = value
        value = tuple(value)
    else:
        label = key.replace('_', ' ').title()

We can see the shape analysis of the value at the top, following by the
destructuring inside.

Note that shape analysis here involves checking the types both of the
container and of one of its components, and some checks on its number of
elements. Once we match the shape, we need to decompose the sequence.
With the proposal in this PEP, we could rewrite that code into this:

    match value:
        case [*v, label := (Promise() | str())] if v:
            value = tuple(v)
        case _:
            label = key.replace('_', ' ').title()

This syntax makes much more explicit which formats are possible for the
input data, and which components are extracted from where. You can see a
pattern similar to list unpacking, but also type checking: the Promise()
pattern is not an object construction, but represents anything that's an
instance of Promise. The pattern operator | separates alternative
patterns (not unlike regular expressions or EBNF grammars), and _ is a
wildcard. (Note that the match syntax used here will accept user-defined
sequences, as well as lists and tuples.)

In some occasions, extraction of information is not as relevant as
identifying structure. Take the following example from the Python
standard library:

    def is_tuple(node):
        if isinstance(node, Node) and node.children == [LParen(), RParen()]:
            return True
        return (isinstance(node, Node)
                and len(node.children) == 3
                and isinstance(node.children[0], Leaf)
                and isinstance(node.children[1], Node)
                and isinstance(node.children[2], Leaf)
                and node.children[0].value == "("
                and node.children[2].value == ")")

This example shows an example of finding out the "shape" of the data
without doing significant extraction. This code is not very easy to
read, and the intended shape that this is trying to match is not
evident. Compare with the updated code using the proposed syntax:

    def is_tuple(node: Node) -> bool:
        match node:
            case Node(children=[LParen(), RParen()]):
                return True
            case Node(children=[Leaf(value="("), Node(), Leaf(value=")")]):
                return True
            case _:
                return False

Note that the proposed code will work without any modifications to the
definition of Node and other classes here. As shown in the examples
above, the proposal supports not just unpacking sequences, but also
doing isinstance checks (like LParen() or str()), looking into object
attributes (Leaf(value="(") for example) and comparisons with literals.

That last feature helps with some kinds of code which look more like the
"switch" statement as present in other languages:

    match response.status:
        case 200:
            do_something(response.data)  # OK
        case 301 | 302:
            retry(response.location)  # Redirect
        case 401:
            retry(auth=get_credentials())  # Login first
        case 426:
            sleep(DELAY)  # Server is swamped, try after a bit
            retry()
        case _:
            raise RequestError("we couldn't get the data")

Although this will work, it's not necessarily what the proposal is
focused on, and the new syntax has been designed to best support the
destructuring scenarios.

See the syntax sections below for a more detailed specification.

We propose that destructuring objects can be customized by a new special
__match_args__ attribute. As part of this PEP we specify the general API
and its implementation for some standard library classes (including
named tuples and dataclasses). See the runtime section below.

Finally, we aim to provide comprehensive support for static type
checkers and similar tools. For this purpose, we propose to introduce a
@typing.sealed class decorator that will be a no-op at runtime but will
indicate to static tools that all sub-classes of this class must be
defined in the same module. This will allow effective static
exhaustiveness checks, and together with dataclasses, will provide basic
support for algebraic data types. See the static checkers section for
more details.

Syntax and Semantics

Patterns

The pattern is a new syntactic construct, that could be considered a
loose generalization of assignment targets. The key properties of a
pattern are what types and shapes of subjects it accepts, what variables
it captures and how it extracts them from the subject. For example, the
pattern [a, b] matches only sequences of exactly 2 elements, extracting
the first element into a and the second one into b.

This PEP defines several types of patterns. These are certainly not the
only possible ones, so the design decision was made to choose a subset
of functionality that is useful now but conservative. More patterns can
be added later as this feature gets more widespread use. See the
rejected ideas and deferred ideas sections for more details.

The patterns listed here are described in more detail below, but
summarized together in this section for simplicity:

-   A literal pattern is useful to filter constant values in a
    structure. It looks like a Python literal (including some values
    like True, False and None). It only matches objects equal to the
    literal, and never binds.
-   A capture pattern looks like x and is equivalent to an identical
    assignment target: it always matches and binds the variable with the
    given (simple) name.
-   The wildcard pattern is a single underscore: _. It always matches,
    but does not capture any variable (which prevents interference with
    other uses for _ and allows for some optimizations).
-   A constant value pattern works like the literal but for certain
    named constants. Note that it must be a qualified (dotted) name,
    given the possible ambiguity with a capture pattern. It looks like
    Color.RED and only matches values equal to the corresponding value.
    It never binds.
-   A sequence pattern looks like [a, *rest, b] and is similar to a list
    unpacking. An important difference is that the elements nested
    within it can be any kind of patterns, not just names or sequences.
    It matches only sequences of appropriate length, as long as all the
    sub-patterns also match. It makes all the bindings of its
    sub-patterns.
-   A mapping pattern looks like {"user": u, "emails": [*es]}. It
    matches mappings with at least the set of provided keys, and if all
    the sub-patterns match their corresponding values. It binds whatever
    the sub-patterns bind while matching with the values corresponding
    to the keys. Adding **rest at the end of the pattern to capture
    extra items is allowed.
-   A class pattern is similar to the above but matches attributes
    instead of keys. It looks like datetime.date(year=y, day=d). It
    matches instances of the given type, having at least the specified
    attributes, as long as the attributes match with the corresponding
    sub-patterns. It binds whatever the sub-patterns bind when matching
    with the values of the given attributes. An optional protocol also
    allows matching positional arguments.
-   An OR pattern looks like [*x] | {"elems": [*x]}. It matches if any
    of its sub-patterns match. It uses the binding for the leftmost
    pattern that matched.
-   A walrus pattern looks like d := datetime(year=2020, month=m). It
    matches only if its sub-pattern also matches. It binds whatever the
    sub-pattern match does, and also binds the named variable to the
    entire object.

The match statement

A simplified, approximate grammar for the proposed syntax is:

    ...
    compound_statement:
        | if_stmt
        ...
        | match_stmt
    match_stmt: "match" expression ':' NEWLINE INDENT case_block+ DEDENT
    case_block: "case" pattern [guard] ':' block
    guard: 'if' expression
    pattern: walrus_pattern | or_pattern
    walrus_pattern: NAME ':=' or_pattern
    or_pattern: closed_pattern ('|' closed_pattern)*
    closed_pattern:
        | literal_pattern
        | capture_pattern
        | wildcard_pattern
        | constant_pattern
        | sequence_pattern
        | mapping_pattern
        | class_pattern

See Appendix A for the full, unabridged grammar. The simplified grammars
in this section are there for helping the reader, not as a full
specification.

We propose that the match operation should be a statement, not an
expression. Although in many languages it is an expression, being a
statement better suits the general logic of Python syntax. See rejected
ideas for more discussion. The allowed patterns are described in detail
below in the patterns subsection.

The match and case keywords are proposed to be soft keywords, so that
they are recognized as keywords at the beginning of a match statement or
case block respectively, but are allowed to be used in other places as
variable or argument names.

The proposed indentation structure is as following:

    match some_expression:
        case pattern_1:
            ...
        case pattern_2:
            ...

Here, some_expression represents the value that is being matched
against, which will be referred to hereafter as the subject of the
match.

Match semantics

The proposed large scale semantics for choosing the match is to choose
the first matching pattern and execute the corresponding suite. The
remaining patterns are not tried. If there are no matching patterns, the
statement 'falls through', and execution continues at the following
statement.

Essentially this is equivalent to a chain of if ... elif ... else
statements. Note that unlike for the previously proposed switch
statement, the pre-computed dispatch dictionary semantics does not apply
here.

There is no default or else case - instead the special wildcard _ can be
used (see the section on capture_pattern) as a final 'catch-all'
pattern.

Name bindings made during a successful pattern match outlive the
executed suite and can be used after the match statement. This follows
the logic of other Python statements that can bind names, such as for
loop and with statement. For example:

    match shape:
        case Point(x, y):
            ...
        case Rectangle(x, y, _, _):
            ...
    print(x, y)  # This works

During failed pattern matches, some sub-patterns may succeed. For
example, while matching the value [0, 1, 2] with the pattern (0, x, 1),
the sub-pattern x may succeed if the list elements are matched from left
to right. The implementation may choose to either make persistent
bindings for those partial matches or not. User code including a match
statement should not rely on the bindings being made for a failed match,
but also shouldn't assume that variables are unchanged by a failed
match. This part of the behavior is left intentionally unspecified so
different implementations can add optimizations, and to prevent
introducing semantic restrictions that could limit the extensibility of
this feature.

Note that some pattern types below define more specific rules about when
the binding is made.

Allowed patterns

We introduce the proposed syntax gradually. Here we start from the main
building blocks. The following patterns are supported:

Literal Patterns

Simplified syntax:

    literal_pattern:
        | number
        | string
        | 'None'
        | 'True'
        | 'False'

A literal pattern consists of a simple literal like a string, a number,
a Boolean literal (True or False), or None:

    match number:
        case 0:
            print("Nothing")
        case 1:
            print("Just one")
        case 2:
            print("A couple")
        case -1:
            print("One less than nothing")
        case 1-1j:
            print("Good luck with that...")

Literal pattern uses equality with literal on the right hand side, so
that in the above example number == 0 and then possibly number == 1, etc
will be evaluated. Note that although technically negative numbers are
represented using unary minus, they are considered literals for the
purpose of pattern matching. Unary plus is not allowed. Binary plus and
minus are allowed only to join a real number and an imaginary number to
form a complex number, such as 1+1j.

Note that because equality (__eq__) is used, and the equivalency between
Booleans and the integers 0 and 1, there is no practical difference
between the following two:

    case True:
        ...

    case 1:
        ...

Triple-quoted strings are supported. Raw strings and byte strings are
supported. F-strings are not allowed (since in general they are not
really literals).

Capture Patterns

Simplified syntax:

    capture_pattern: NAME

A capture pattern serves as an assignment target for the matched
expression:

    match greeting:
        case "":
            print("Hello!")
        case name:
            print(f"Hi {name}!")

Only a single name is allowed (a dotted name is a constant value
pattern). A capture pattern always succeeds. A capture pattern appearing
in a scope makes the name local to that scope. For example, using name
after the above snippet may raise UnboundLocalError rather than
NameError, if the "" case clause was taken:

    match greeting:
        case "":
            print("Hello!")
        case name:
            print(f"Hi {name}!")
    if name == "Santa":      # <-- might raise UnboundLocalError
        ...                  # but works fine if greeting was not empty

While matching against each case clause, a name may be bound at most
once, having two capture patterns with coinciding names is an error:

    match data:
        case [x, x]:  # Error!
            ...

Note: one can still match on a collection with equal items using guards.
Also, [x, y] | Point(x, y) is a legal pattern because the two
alternatives are never matched at the same time.

The single underscore (_) is not considered a NAME and treated specially
as a wildcard pattern.

Reminder: None, False and True are keywords denoting literals, not
names.

Wildcard Pattern

Simplified syntax:

    wildcard_pattern: "_"

The single underscore (_) name is a special kind of pattern that always
matches but never binds:

    match data:
        case [_, _]:
            print("Some pair")
            print(_)  # Error!

Given that no binding is made, it can be used as many times as desired,
unlike capture patterns.

Constant Value Patterns

Simplified syntax:

    constant_pattern: NAME ('.' NAME)+

This is used to match against constants and enum values. Every dotted
name in a pattern is looked up using normal Python name resolution
rules, and the value is used for comparison by equality with the match
subject (same as for literals):

    from enum import Enum

    class Sides(str, Enum):
        SPAM = "Spam"
        EGGS = "eggs"
        ...

    match entree[-1]:
        case Sides.SPAM:  # Compares entree[-1] == Sides.SPAM.
            response = "Have you got anything without Spam?"
        case side:  # Assigns side = entree[-1].
            response = f"Well, could I have their Spam instead of the {side} then?"

Note that there is no way to use unqualified names as constant value
patterns (they always denote variables to be captured). See rejected
ideas for other syntactic alternatives that were considered for constant
value patterns.

Sequence Patterns

Simplified syntax:

    sequence_pattern:
        | '[' [values_pattern] ']'
        | '(' [value_pattern ',' [values pattern]] ')'
    values_pattern: ','.value_pattern+ ','?
    value_pattern: '*' capture_pattern | pattern

A sequence pattern follows the same semantics as unpacking assignment.
Like unpacking assignment, both tuple-like and list-like syntax can be
used, with identical semantics. Each element can be an arbitrary
pattern; there may also be at most one *name pattern to catch all
remaining items:

    match collection:
        case 1, [x, *others]:
            print("Got 1 and a nested sequence")
        case (1, x):
            print(f"Got 1 and {x}")

To match a sequence pattern the subject must be an instance of
collections.abc.Sequence, and it cannot be any kind of string (str,
bytes, bytearray). It cannot be an iterator. For matching on a specific
collection class, see class pattern below.

The _ wildcard can be starred to match sequences of varying lengths. For
example:

-   [*_] matches a sequence of any length.
-   (_, _, *_), matches any sequence of length two or more.
-   ["a", *_, "z"] matches any sequence of length two or more that
    starts with "a" and ends with "z".

Mapping Patterns

Simplified syntax:

    mapping_pattern: '{' [items_pattern] '}'
    items_pattern: ','.key_value_pattern+ ','?
    key_value_pattern:
        | (literal_pattern | constant_pattern) ':' or_pattern
        | '**' capture_pattern

Mapping pattern is a generalization of iterable unpacking to mappings.
Its syntax is similar to dictionary display but each key and value are
patterns "{" (pattern ":" pattern)+ "}". A **rest pattern is also
allowed, to extract the remaining items. Only literal and constant value
patterns are allowed in key positions:

    import constants

    match config:
        case {"route": route}:
            process_route(route)
        case {constants.DEFAULT_PORT: sub_config, **rest}:
            process_config(sub_config, rest)

The subject must be an instance of collections.abc.Mapping. Extra keys
in the subject are ignored even if **rest is not present. This is
different from sequence pattern, where extra items will cause a match to
fail. But mappings are actually different from sequences: they have
natural structural sub-typing behavior, i.e., passing a dictionary with
extra keys somewhere will likely just work.

For this reason, **_ is invalid in mapping patterns; it would always be
a no-op that could be removed without consequence.

Matched key-value pairs must already be present in the mapping, and not
created on-the-fly by __missing__ or __getitem__. For example,
collections.defaultdict instances will only match patterns with keys
that were already present when the match block was entered.

Class Patterns

Simplified syntax:

    class_pattern:
        | name_or_attr '(' ')'
        | name_or_attr '(' ','.pattern+ ','? ')'
        | name_or_attr '(' ','.keyword_pattern+ ','? ')'
        | name_or_attr '(' ','.pattern+ ',' ','.keyword_pattern+ ','? ')'
    keyword_pattern: NAME '=' or_pattern

A class pattern provides support for destructuring arbitrary objects.
There are two possible ways of matching on object attributes: by
position like Point(1, 2), and by name like Point(x=1, y=2). These two
can be combined, but a positional match cannot follow a match by name.
Each item in a class pattern can be an arbitrary pattern. A simple
example:

    match shape:
        case Point(x, y):
            ...
        case Rectangle(x0, y0, x1, y1, painted=True):
            ...

Whether a match succeeds or not is determined by the equivalent of an
isinstance call. If the subject (shape, in the example) is not an
instance of the named class (Point or Rectangle), the match fails.
Otherwise, it continues (see details in the runtime section).

The named class must inherit from type. It may be a single name or a
dotted name (e.g. some_mod.SomeClass or mod.pkg.Class). The leading name
must not be _, so e.g. _(...) and _.C(...) are invalid. Use
object(foo=_) to check whether the matched object has an attribute foo.

By default, sub-patterns may only be matched by keyword for user-defined
classes. In order to support positional sub-patterns, a custom
__match_args__ attribute is required. The runtime allows matching
against arbitrarily nested patterns by chaining all of the instance
checks and attribute lookups appropriately.

Combining multiple patterns (OR patterns)

Multiple alternative patterns can be combined into one using |. This
means the whole pattern matches if at least one alternative matches.
Alternatives are tried from left to right and have a short-circuit
property, subsequent patterns are not tried if one matched. Examples:

    match something:
        case 0 | 1 | 2:
            print("Small number")
        case [] | [_]:
            print("A short sequence")
        case str() | bytes():
            print("Something string-like")
        case _:
            print("Something else")

The alternatives may bind variables, as long as each alternative binds
the same set of variables (excluding _). For example:

    match something:
        case 1 | x:  # Error!
            ...
        case x | 1:  # Error!
            ...
        case one := [1] | two := [2]:  # Error!
            ...
        case Foo(arg=x) | Bar(arg=x):  # Valid, both arms bind 'x'
            ...
        case [x] | x:  # Valid, both arms bind 'x'
            ...

Guards

Each top-level pattern can be followed by a guard of the form
if expression. A case clause succeeds if the pattern matches and the
guard evaluates to a true value. For example:

    match input:
        case [x, y] if x > MAX_INT and y > MAX_INT:
            print("Got a pair of large numbers")
        case x if x > MAX_INT:
            print("Got a large number")
        case [x, y] if x == y:
            print("Got equal items")
        case _:
            print("Not an outstanding input")

If evaluating a guard raises an exception, it is propagated onwards
rather than fail the case clause. Names that appear in a pattern are
bound before the guard succeeds. So this will work:

    values = [0]

    match values:
        case [x] if x:
            ...  # This is not executed
        case _:
            ...
    print(x)  # This will print "0"

Note that guards are not allowed for nested patterns, so that
[x if x > 0] is a SyntaxError and 1 | 2 if 3 | 4 will be parsed as
(1 | 2) if (3 | 4).

Walrus patterns

It is often useful to match a sub-pattern and bind the corresponding
value to a name. For example, it can be useful to write more efficient
matches, or simply to avoid repetition. To simplify such cases, any
pattern (other than the walrus pattern itself) can be preceded by a name
and the walrus operator (:=). For example:

    match get_shape():
        case Line(start := Point(x, y), end) if start == end:
            print(f"Zero length line at {x}, {y}")

The name on the left of the walrus operator can be used in a guard, in
the match suite, or after the match statement. However, the name will
only be bound if the sub-pattern succeeds. Another example:

    match group_shapes():
        case [], [point := Point(x, y), *other]:
            print(f"Got {point} in the second group")
            process_coordinates(x, y)
            ...

Technically, most such examples can be rewritten using guards and/or
nested match statements, but this will be less readable and/or will
produce less efficient code. Essentially, most of the arguments in PEP
572 apply here equally.

The wildcard _ is not a valid name here.

Runtime specification

The Match Protocol

The equivalent of an isinstance call is used to decide whether an object
matches a given class pattern and to extract the corresponding
attributes. Classes requiring different matching semantics (such as
duck-typing) can do so by defining __instancecheck__ (a pre-existing
metaclass hook) or by using typing.Protocol.

The procedure is as following:

-   The class object for Class in Class(<sub-patterns>) is looked up and
    isinstance(obj, Class) is called, where obj is the value being
    matched. If false, the match fails.
-   Otherwise, if any sub-patterns are given in the form of positional
    or keyword arguments, these are matched from left to right, as
    follows. The match fails as soon as a sub-pattern fails; if all
    sub-patterns succeed, the overall class pattern match succeeds.
-   If there are match-by-position items and the class has a
    __match_args__ attribute, the item at position i is matched against
    the value looked up by attribute __match_args__[i]. For example, a
    pattern Point2d(5, 8), where Point2d.__match_args__ == ["x", "y"],
    is translated (approximately) into obj.x == 5 and obj.y == 8.
-   If there are more positional items than the length of
    __match_args__, a TypeError is raised.
-   If the __match_args__ attribute is absent on the matched class, and
    one or more positional item appears in a match, TypeError is also
    raised. We don't fall back on using __slots__ or __annotations__ --
    "In the face of ambiguity, refuse the temptation to guess."
-   If there are any match-by-keyword items the keywords are looked up
    as attributes on the subject. If the lookup succeeds the value is
    matched against the corresponding sub-pattern. If the lookup fails,
    the match fails.

Such a protocol favors simplicity of implementation over flexibility and
performance. For other considered alternatives, see extended matching.

For the most commonly-matched built-in types (bool, bytearray, bytes,
dict, float, frozenset, int, list, set, str, and tuple), a single
positional sub-pattern is allowed to be passed to the call. Rather than
being matched against any particular attribute on the subject, it is
instead matched against the subject itself. This creates behavior that
is useful and intuitive for these objects:

-   bool(False) matches False (but not 0).
-   tuple((0, 1, 2)) matches (0, 1, 2) (but not [0, 1, 2]).
-   int(i) matches any int and binds it to the name i.

Overlapping sub-patterns

Certain classes of overlapping matches are detected at runtime and will
raise exceptions. In addition to basic checks described in the previous
subsection:

-   The interpreter will check that two match items are not targeting
    the same attribute, for example Point2d(1, 2, y=3) is an error.
-   It will also check that a mapping pattern does not attempt to match
    the same key more than once.

Special attribute __match_args__

The __match_args__ attribute is always looked up on the type object
named in the pattern. If present, it must be a list or tuple of strings
naming the allowed positional arguments.

In deciding what names should be available for matching, the recommended
practice is that class patterns should be the mirror of construction;
that is, the set of available names and their types should resemble the
arguments to __init__().

Only match-by-name will work by default, and classes should define
__match_args__ as a class attribute if they would like to support
match-by-position. Additionally, dataclasses and named tuples will
support match-by-position out of the box. See below for more details.

Exceptions and side effects

While matching each case, the match statement may trigger execution of
other functions (for example __getitem__(), __len__() or a property).
Almost every exception caused by those propagates outside of the match
statement normally. The only case where an exception is not propagated
is an AttributeError raised while trying to lookup an attribute while
matching attributes of a Class Pattern; that case results in just a
matching failure, and the rest of the statement proceeds normally.

The only side-effect carried on explicitly by the matching process is
the binding of names. However, the process relies on attribute access,
instance checks, len(), equality and item access on the subject and some
of its components. It also evaluates constant value patterns and the
left side of class patterns. While none of those typically create any
side-effects, some of these objects could. This proposal intentionally
leaves out any specification of what methods are called or how many
times. User code relying on that behavior should be considered buggy.

The standard library

To facilitate the use of pattern matching, several changes will be made
to the standard library:

-   Namedtuples and dataclasses will have auto-generated __match_args__.
-   For dataclasses the order of attributes in the generated
    __match_args__ will be the same as the order of corresponding
    arguments in the generated __init__() method. This includes the
    situations where attributes are inherited from a superclass.

In addition, a systematic effort will be put into going through existing
standard library classes and adding __match_args__ where it looks
beneficial.

Static checkers specification

Exhaustiveness checks

From a reliability perspective, experience shows that missing a case
when dealing with a set of possible data values leads to hard to debug
issues, thus forcing people to add safety asserts like this:

    def get_first(data: Union[int, list[int]]) -> int:
        if isinstance(data, list) and data:
            return data[0]
        elif isinstance(data, int):
            return data
        else:
            assert False, "should never get here"

PEP 484 specifies that static type checkers should support
exhaustiveness in conditional checks with respect to enum values. PEP
586 later generalized this requirement to literal types.

This PEP further generalizes this requirement to arbitrary patterns. A
typical situation where this applies is matching an expression with a
union type:

    def classify(val: Union[int, Tuple[int, int], List[int]]) -> str:
        match val:
            case [x, y] if x > 0 and y > 0:
                return f"A pair of {x} and {y}"
            case [x, *other]:
                return f"A sequence starting with {x}"
            case int():
                return f"Some integer"
            # Type-checking error: some cases unhandled.

The exhaustiveness checks should also apply where both pattern matching
and enum values are combined:

    from enum import Enum
    from typing import Union

    class Level(Enum):
        BASIC = 1
        ADVANCED = 2
        PRO = 3

    class User:
        name: str
        level: Level

    class Admin:
        name: str

    account: Union[User, Admin]

    match account:
        case Admin(name=name) | User(name=name, level=Level.PRO):
            ...
        case User(level=Level.ADVANCED):
            ...
        # Type-checking error: basic user unhandled

Obviously, no Matchable protocol (in terms of PEP 544) is needed, since
every class is matchable and therefore is subject to the checks
specified above.

Sealed classes as algebraic data types

Quite often it is desirable to apply exhaustiveness to a set of classes
without defining ad-hoc union types, which is itself fragile if a class
is missing in the union definition. A design pattern where a group of
record-like classes is combined into a union is popular in other
languages that support pattern matching and is known under a name of
algebraic data types.

We propose to add a special decorator class @sealed to the typing
module, that will have no effect at runtime, but will indicate to static
type checkers that all subclasses (direct and indirect) of this class
should be defined in the same module as the base class.

The idea is that since all subclasses are known, the type checker can
treat the sealed base class as a union of all its subclasses. Together
with dataclasses this allows a clean and safe support of algebraic data
types in Python. Consider this example:

    from dataclasses import dataclass
    from typing import sealed

    @sealed
    class Node:
        ...

    class Expression(Node):
        ...

    class Statement(Node):
        ...

    @dataclass
    class Name(Expression):
        name: str

    @dataclass
    class Operation(Expression):
        left: Expression
        op: str
        right: Expression

    @dataclass
    class Assignment(Statement):
        target: str
        value: Expression

    @dataclass
    class Print(Statement):
        value: Expression

With such definition, a type checker can safely treat Node as
Union[Name, Operation, Assignment, Print], and also safely treat e.g.
Expression as Union[Name, Operation]. So this will result in a type
checking error in the below snippet, because Name is not handled (and
type checker can give a useful error message):

    def dump(node: Node) -> str:
        match node:
            case Assignment(target, value):
                return f"{target} = {dump(value)}"
            case Print(value):
                return f"print({dump(value)})"
            case Operation(left, op, right):
                return f"({dump(left)} {op} {dump(right)})"

Type erasure

Class patterns are subject to runtime type erasure. Namely, although one
can define a type alias IntQueue = Queue[int] so that a pattern like
IntQueue() is syntactically valid, type checkers should reject such a
match:

    queue: Union[Queue[int], Queue[str]]
    match queue:
        case IntQueue():  # Type-checking error here
            ...

Note that the above snippet actually fails at runtime with the current
implementation of generic classes in the typing module, as well as with
builtin generic classes in the recently accepted PEP 585, because they
prohibit isinstance checks.

To clarify, generic classes are not prohibited in general from
participating in pattern matching, just that their type parameters can't
be explicitly specified. It is still fine if sub-patterns or literals
bind the type variables. For example:

    from typing import Generic, TypeVar, Union

    T = TypeVar('T')

    class Result(Generic[T]):
        first: T
        other: list[T]

    result: Union[Result[int], Result[str]]

    match result:
        case Result(first=int()):
            ...  # Type of result is Result[int] here
        case Result(other=["foo", "bar", *rest]):
            ...  # Type of result is Result[str] here

Note about constants

The fact that a capture pattern is always an assignment target may
create unwanted consequences when a user by mistake tries to "match" a
value against a constant instead of using the constant value pattern. As
a result, at runtime such a match will always succeed and moreover
override the value of the constant. It is important therefore that
static type checkers warn about such situations. For example:

    from typing import Final

    MAX_INT: Final = 2 ** 64

    value = 0

    match value:
        case MAX_INT:  # Type-checking error here: cannot assign to final name
            print("Got big number")
        case _:
            print("Something else")

Note that the CPython reference implementation also generates a
SyntaxWarning message for this case.

Precise type checking of star matches

Type checkers should perform precise type checking of star items in
pattern matching giving them either a heterogeneous list[T] type, or a
TypedDict type as specified by PEP 589. For example:

    stuff: Tuple[int, str, str, float]

    match stuff:
        case a, *b, 0.5:
            # Here a is int and b is list[str]
            ...

Performance Considerations

Ideally, a match statement should have good runtime performance compared
to an equivalent chain of if-statements. Although the history of
programming languages is rife with examples of new features which
increased engineer productivity at the expense of additional CPU cycles,
it would be unfortunate if the benefits of match were counter-balanced
by a significant overall decrease in runtime performance.

Although this PEP does not specify any particular implementation
strategy, a few words about the prototype implementation and how it
attempts to maximize performance are in order.

Basically, the prototype implementation transforms all of the match
statement syntax into equivalent if/else blocks - or more accurately,
into Python byte codes that have the same effect. In other words, all of
the logic for testing instance types, sequence lengths, mapping keys and
so on are inlined in place of the match.

This is not the only possible strategy, nor is it necessarily the best.
For example, the instance checks could be memoized, especially if there
are multiple instances of the same class type but with different
arguments in a single match statement. It is also theoretically possible
for a future implementation to process case clauses or sub-patterns in
parallel using a decision tree rather than testing them one by one.

Backwards Compatibility

This PEP is fully backwards compatible: the match and case keywords are
proposed to be (and stay!) soft keywords, so their use as variable,
function, class, module or attribute names is not impeded at all.

This is important because match is the name of a popular and well-known
function and method in the re module, which we have no desire to break
or deprecate.

The difference between hard and soft keywords is that hard keywords are
always reserved words, even in positions where they make no sense (e.g.
x = class + 1), while soft keywords only get a special meaning in
context. Since PEP 617 the parser backtracks, that means that on
different attempts to parse a code fragment it could interpret a soft
keyword differently.

For example, suppose the parser encounters the following input:

    match [x, y]:

The parser first attempts to parse this as an expression statement. It
interprets match as a NAME token, and then considers [x, y] to be a
double subscript. It then encounters the colon and has to backtrack,
since an expression statement cannot be followed by a colon. The parser
then backtracks to the start of the line and finds that match is a soft
keyword allowed in this position. It then considers [x, y] to be a list
expression. The colon then is just what the parser expected, and the
parse succeeds.

Impacts on third-party tools

There are a lot of tools in the Python ecosystem that operate on Python
source code: linters, syntax highlighters, auto-formatters, and IDEs.
These will all need to be updated to include awareness of the match
statement.

In general, these tools fall into one of two categories:

Shallow parsers don't try to understand the full syntax of Python, but
instead scan the source code for specific known patterns. IDEs, such as
Visual Studio Code, Emacs and TextMate, tend to fall in this category,
since frequently the source code is invalid while being edited, and a
strict approach to parsing would fail.

For these kinds of tools, adding knowledge of a new keyword is
relatively easy, just an addition to a table, or perhaps modification of
a regular expression.

Deep parsers understand the complete syntax of Python. An example of
this is the auto-formatter Black. A particular requirement with these
kinds of tools is that they not only need to understand the syntax of
the current version of Python, but older versions of Python as well.

The match statement uses a soft keyword, and it is one of the first
major Python features to take advantage of the capabilities of the new
PEG parser. This means that third-party parsers which are not
'PEG-compatible' will have a hard time with the new syntax.

It has been noted that a number of these third-party tools leverage
common parsing libraries (Black for example uses a fork of the lib2to3
parser). It may be helpful to identify widely used parsing libraries
(such as parso and libCST) and upgrade them to be PEG compatible.

However, since this work would need to be done not only for the match
statement, but for any new Python syntax that leverages the capabilities
of the PEG parser, it is considered out of scope for this PEP. (Although
it is suggested that this would make a fine Summer of Code project.)

Reference Implementation

A feature-complete CPython implementation is available on GitHub.

An interactive playground based on the above implementation was created
using Binder and Jupyter.

Example Code

A small collection of example code is available on GitHub.

Rejected Ideas

This general idea has been floating around for a pretty long time, and
many back and forth decisions were made. Here we summarize many
alternative paths that were taken but eventually abandoned.

Don't do this, pattern matching is hard to learn

In our opinion, the proposed pattern matching is not more difficult than
adding isinstance() and getattr() to iterable unpacking. Also, we
believe the proposed syntax significantly improves readability for a
wide range of code patterns, by allowing to express what one wants to
do, rather than how to do it. We hope the few real code snippets we
included in the PEP above illustrate this comparison well enough. For
more real code examples and their translations see Ref.[1].

Don't do this, use existing method dispatching mechanisms

We recognize that some of the use cases for the match statement overlap
with what can be done with traditional object-oriented programming (OOP)
design techniques using class inheritance. The ability to choose
alternate behaviors based on testing the runtime type of a match subject
might even seem heretical to strict OOP purists.

However, Python has always been a language that embraces a variety of
programming styles and paradigms. Classic Python design idioms such as
"duck"-typing go beyond the traditional OOP model.

We believe that there are important use cases where the use of match
results in a cleaner and more maintainable architecture. These use cases
tend to be characterized by a number of features:

-   Algorithms which cut across traditional lines of data encapsulation.
    If an algorithm is processing heterogeneous elements of different
    types (such as evaluating or transforming an abstract syntax tree,
    or doing algebraic manipulation of mathematical symbols), forcing
    the user to implement the algorithm as individual methods on each
    element type results in logic that is smeared across the entire
    codebase instead of being neatly localized in one place.
-   Program architectures where the set of possible data types is
    relatively stable, but there is an ever-expanding set of operations
    to be performed on those data types. Doing this in a strict OOP
    fashion requires constantly adding new methods to both the base
    class and subclasses to support the new methods, "polluting" the
    base class with lots of very specialized method definitions, and
    causing widespread disruption and churn in the code. By contrast, in
    a match-based dispatch, adding a new behavior merely involves
    writing a new match statement.
-   OOP also does not handle dispatching based on the shape of an
    object, such as the length of a tuple, or the presence of an
    attribute -- instead any such dispatching decision must be encoded
    into the object's type. Shape-based dispatching is particularly
    interesting when it comes to handling "duck"-typed objects.

Where OOP is clearly superior is in the opposite case: where the set of
possible operations is relatively stable and well-defined, but there is
an ever-growing set of data types to operate on. A classic example of
this is UI widget toolkits, where there is a fixed set of interaction
types (repaint, mouse click, keypress, and so on), but the set of widget
types is constantly expanding as developers invent new and creative user
interaction styles. Adding a new kind of widget is a simple matter of
writing a new subclass, whereas with a match-based approach you end up
having to add a new case clause to many widespread match statements. We
therefore don't recommend using match in such a situation.

Allow more flexible assignment targets instead

There was an idea to instead just generalize the iterable unpacking to
much more general assignment targets, instead of adding a new kind of
statement. This concept is known in some other languages as "irrefutable
matches". We decided not to do this because inspection of real-life
potential use cases showed that in vast majority of cases destructuring
is related to an if condition. Also many of those are grouped in a
series of exclusive choices.

Make it an expression

In most other languages pattern matching is represented by an
expression, not statement. But making it an expression would be
inconsistent with other syntactic choices in Python. All decision making
logic is expressed almost exclusively in statements, so we decided to
not deviate from this.

Use a hard keyword

There were options to make match a hard keyword, or choose a different
keyword. Although using a hard keyword would simplify life for
simple-minded syntax highlighters, we decided not to use hard keyword
for several reasons:

-   Most importantly, the new parser doesn't require us to do this.
    Unlike with async that caused hardships with being a soft keyword
    for few releases, here we can make match a permanent soft keyword.
-   match is so commonly used in existing code, that it would break
    almost every existing program and will put a burden to fix code on
    many people who may not even benefit from the new syntax.
-   It is hard to find an alternative keyword that would not be commonly
    used in existing programs as an identifier, and would still clearly
    reflect the meaning of the statement.

Use as or | instead of case for case clauses

The pattern matching proposed here is a combination of multi-branch
control flow (in line with switch in Algol-derived languages or cond in
Lisp) and object-deconstruction as found in functional languages. While
the proposed keyword case highlights the multi-branch aspect,
alternative keywords such as as would equally be possible, highlighting
the deconstruction aspect. as or with, for instance, also have the
advantage of already being keywords in Python. However, since case as a
keyword can only occur as a leading keyword inside a match statement, it
is easy for a parser to distinguish between its use as a keyword or as a
variable.

Other variants would use a symbol like | or =>, or go entirely without
special marker.

Since Python is a statement-oriented language in the tradition of Algol,
and as each composite statement starts with an identifying keyword, case
seemed to be most in line with Python's style and traditions.

Use a flat indentation scheme

There was an idea to use an alternative indentation scheme, for example
where every case clause would not be indented with respect to the
initial match part:

    match expression:
    case pattern_1:
        ...
    case pattern_2:
        ...

The motivation is that although flat indentation saves some horizontal
space, it may look awkward to an eye of a Python programmer, because
everywhere else colon is followed by an indent. This will also
complicate life for simple-minded code editors. Finally, the horizontal
space issue can be alleviated by allowing "half-indent" (i.e. two spaces
instead of four) for match statements.

In sample programs using match, written as part of the development of
this PEP, a noticeable improvement in code brevity is observed, more
than making up for the additional indentation level.

Another proposal considered was to use flat indentation but put the
expression on the line after match:, like this:

    match:
        expression
    case pattern_1:
        ...
    case pattern_2:
        ...

This was ultimately rejected because the first block would be a novelty
in Python's grammar: a block whose only content is a single expression
rather than a sequence of statements.

Alternatives for constant value pattern

This is probably the trickiest item. Matching against some pre-defined
constants is very common, but the dynamic nature of Python also makes it
ambiguous with capture patterns. Five other alternatives were
considered:

-   Use some implicit rules. For example, if a name was defined in the
    global scope, then it refers to a constant, rather than representing
    a capture pattern:

        # Here, the name "spam" must be defined in the global scope (and
        # not shadowed locally). "side" must be local.

        match entree[-1]:
            case spam: ...  # Compares entree[-1] == spam.
            case side: ...  # Assigns side = entree[-1].

    This however can cause surprises and action at a distance if someone
    defines an unrelated coinciding name before the match statement.

-   Use a rule based on the case of a name. In particular, if the name
    starts with a lowercase letter it would be a capture pattern, while
    if it starts with uppercase it would refer to a constant:

        match entree[-1]:
            case SPAM: ...  # Compares entree[-1] == SPAM.
            case side: ...  # Assigns side = entree[-1].

    This works well with the recommendations for naming constants from
    PEP 8. The main objection is that there's no other part of core
    Python where the case of a name is semantically significant. In
    addition, Python allows identifiers to use different scripts, many
    of which (e.g. CJK) don't have a case distinction.

-   Use extra parentheses to indicate lookup semantics for a given name.
    For example:

        match entree[-1]:
            case (spam): ...  # Compares entree[-1] == spam.
            case side: ...    # Assigns side = entree[-1].

    This may be a viable option, but it can create some visual noise if
    used often. Also honestly it looks pretty unusual, especially in
    nested contexts.

    This also has the problem that we may want or need parentheses to
    disambiguate grouping in patterns, e.g. in
    Point(x, y=(y := complex())).

-   Introduce a special symbol, for example ., ?, $, or ^ to indicate
    that a given name is a value to be matched against, not to be
    assigned to. An earlier version of this proposal used a leading-dot
    rule:

        match entree[-1]:
            case .spam: ...  # Compares entree[-1] == spam.
            case side: ...   # Assigns side = entree[-1].

    While potentially useful, it introduces strange-looking new syntax
    without making the pattern syntax any more expressive. Indeed, named
    constants can be made to work with the existing rules by converting
    them to Enum types, or enclosing them in their own namespace
    (considered by the authors to be one honking great idea):

        match entree[-1]:
            case Sides.SPAM: ...  # Compares entree[-1] == Sides.SPAM.
            case side: ...        # Assigns side = entree[-1].

    If needed, the leading-dot rule (or a similar variant) could be
    added back later with no backward-compatibility issues.

-   There was also an idea to make lookup semantics the default, and
    require $ or ? to be used in capture patterns:

        match entree[-1]:
            case spam: ...   # Compares entree[-1] == spam.
            case side?: ...  # Assigns side = entree[-1].

    There are a few issues with this:

    -   Capture patterns are more common in typical code, so it is
        undesirable to require special syntax for them.

    -   The authors are not aware of any other language that adorns
        captures in this way.

    -   None of the proposed syntaxes have any precedent in Python; no
        other place in Python that binds names (e.g. import, def, for)
        uses special marker syntax.

    -   It would break the syntactic parallels of the current grammar:

            match coords:
                case ($x, $y):
                    return Point(x, y)  # Why not "Point($x, $y)"?

In the end, these alternatives were rejected because of the mentioned
drawbacks.

Disallow float literals in patterns

Because of the inexactness of floats, an early version of this proposal
did not allow floating-point constants to be used as match patterns.
Part of the justification for this prohibition is that Rust does this.

However, during implementation, it was discovered that distinguishing
between float values and other types required extra code in the VM that
would slow matches generally. Given that Python and Rust are very
different languages with different user bases and underlying
philosophies, it was felt that allowing float literals would not cause
too much harm, and would be less surprising to users.

Range matching patterns

This would allow patterns such as 1...6. However, there are a host of
ambiguities:

-   Is the range open, half-open, or closed? (I.e. is 6 included in the
    above example or not?)
-   Does the range match a single number, or a range object?
-   Range matching is often used for character ranges ('a'...'z') but
    that won't work in Python since there's no character data type, just
    strings.
-   Range matching can be a significant performance optimization if you
    can pre-build a jump table, but that's not generally possible in
    Python due to the fact that names can be dynamically rebound.

Rather than creating a special-case syntax for ranges, it was decided
that allowing custom pattern objects (InRange(0, 6)) would be more
flexible and less ambiguous; however those ideas have been postponed for
the time being (See deferred ideas).

Use dispatch dict semantics for matches

Implementations for classic switch statement sometimes use a
pre-computed hash table instead of a chained equality comparisons to
gain some performance. In the context of match statement this is
technically also possible for matches against literal patterns. However,
having subtly different semantics for different kinds of patterns would
be too surprising for potentially modest performance win.

We can still experiment with possible performance optimizations in this
direction if they will not cause semantic differences.

Use continue and break in case clauses.

Another rejected proposal was to define new meanings for continue and
break inside of match, which would have the following behavior:

-   continue would exit the current case clause and continue matching at
    the next case clause.
-   break would exit the match statement.

However, there is a serious drawback to this proposal: if the match
statement is nested inside of a loop, the meanings of continue and break
are now changed. This may cause unexpected behavior during refactorings;
also, an argument can be made that there are other means to get the same
behavior (such as using guard conditions), and that in practice it's
likely that the existing behavior of continue and break are far more
useful.

AND (&) patterns

This proposal defines an OR-pattern (|) to match one of several
alternates; why not also an AND-pattern (&)? Especially given that some
other languages (F# for example) support this.

However, it's not clear how useful this would be. The semantics for
matching dictionaries, objects and sequences already incorporates an
implicit 'and': all attributes and elements mentioned must be present
for the match to succeed. Guard conditions can also support many of the
use cases that a hypothetical 'and' operator would be used for.

In the end, it was decided that this would make the syntax more complex
without adding a significant benefit.

Negative match patterns

A negation of a match pattern using the operator ! as a prefix would
match exactly if the pattern itself does not match. For instance,
!(3 | 4) would match anything except 3 or 4.

This was rejected because there is documented evidence that this feature
is rarely useful (in languages which support it) or used as double
negation !! to control variable scopes and prevent variable bindings
(which does not apply to Python). It can also be simulated using guard
conditions.

Check exhaustiveness at runtime

The question is what to do if no case clause has a matching pattern, and
there is no default case. An earlier version of the proposal specified
that the behavior in this case would be to throw an exception rather
than silently falling through.

The arguments back and forth were many, but in the end the EIBTI
(Explicit Is Better Than Implicit) argument won out: it's better to have
the programmer explicitly throw an exception if that is the behavior
they want.

For cases such as sealed classes and enums, where the patterns are all
known to be members of a discrete set, static checkers can warn about
missing patterns.

Type annotations for pattern variables

The proposal was to combine patterns with type annotations:

    match x:
        case [a: int, b: str]: print(f"An int {a} and a string {b}:)
        case [a: int, b: int, c: int]: print(f"Three ints", a, b, c)
        ...

This idea has a lot of problems. For one, the colon can only be used
inside of brackets or parens, otherwise the syntax becomes ambiguous.
And because Python disallows isinstance() checks on generic types, type
annotations containing generics will not work as expected.

Allow *rest in class patterns

It was proposed to allow *rest in a class pattern, giving a variable to
be bound to all positional arguments at once (similar to its use in
unpacking assignments). It would provide some symmetry with sequence
patterns. But it might be confused with a feature to provide the values
for all positional arguments at once. And there seems to be no practical
need for it, so it was scrapped. (It could easily be added at a later
stage if a need arises.)

Disallow _.a in constant value patterns

The first public draft said that the initial name in a constant value
pattern must not be _ because _ has a special meaning in pattern
matching, so this would be invalid:

    case _.a: ...

(However, a._ would be legal and load the attribute with name _ of the
object a as usual.)

There was some pushback against this on python-dev (some people have a
legitimate use for _ as an important global variable, esp. in i18n) and
the only reason for this prohibition was to prevent some user confusion.
But it's not the hill to die on.

Use some other token as wildcard

It has been proposed to use ... (i.e., the ellipsis token) or * (star)
as a wildcard. However, both these look as if an arbitrary number of
items is omitted:

    case [a, ..., z]: ...
    case [a, *, z]: ...

Both look like the would match a sequence of at two or more items,
capturing the first and last values.

In addition, if * were to be used as the wildcard character, we would
have to come up with some other way to capture the rest of a sequence,
currently spelled like this:

    case [first, second, *rest]: ...

Using an ellipsis would also be more confusing in documentation and
examples, where ... is routinely used to indicate something obvious or
irrelevant. (Yes, this would also be an argument against the other uses
of ... in Python, but that water is already under the bridge.)

Another proposal was to use ?. This could be acceptable, although it
would require modifying the tokenizer.

Also, _ is already used as a throwaway target in other contexts, and
this use is pretty similar. This example is from difflib.py in the
stdlib:

    for tag, _, _, j1, j2 in group: ...

Perhaps the most convincing argument is that _ is used as the wildcard
in every other language we've looked at supporting pattern matching: C#,
Elixir, Erlang, F#, Haskell, Mathematica, OCaml, Ruby, Rust, Scala, and
Swift. Now, in general, we should not be concerned too much with what
another language does, since Python is clearly different from all these
languages. However, if there is such an overwhelming and strong
consensus, Python should not go out of its way to do something
completely different -- particularly given that _ works well in Python
and is already in use as a throwaway target.

Note that _ is not assigned to by patterns -- this avoids conflicts with
the use of _ as a marker for translatable strings and an alias for
gettext.gettext, as recommended by the gettext module documentation.

Use some other syntax instead of | for OR patterns

A few alternatives to using | to separate the alternatives in OR
patterns have been proposed. Instead of:

    case 401|403|404:
        print("Some HTTP error")

the following proposals have been fielded:

-   Use a comma:

        case 401, 403, 404:
          print("Some HTTP error")

    This looks too much like a tuple -- we would have to find a
    different way to spell tuples, and the construct would have to be
    parenthesized inside the argument list of a class pattern. In
    general, commas already have many different meanings in Python, we
    shouldn't add more.

-   Allow stacked cases:

        case 401:
        case 403:
        case 404:
          print("Some HTTP error")

    This is how this would be done in C, using its fall-through
    semantics for cases. However, we don't want to mislead people into
    thinking that match/case uses fall-through semantics (which are a
    common source of bugs in C). Also, this would be a novel indentation
    pattern, which might make it harder to support in IDEs and such (it
    would break the simple rule "add an indentation level after a line
    ending in a colon"). Finally, this wouldn't support OR patterns
    nested inside other patterns.

-   Use case in followed by a comma-separated list:

        case in 401, 403, 404:
          print("Some HTTP error")

    This wouldn't work for OR patterns nested inside other patterns,
    like:

        case Point(0|1, 0|1):
            print("A corner of the unit square")

-   Use the or keyword:

        case 401 or 403 or 404:
            print("Some HTTP error")

    This could work, and the readability is not too different from using
    |. Some users expressed a preference for or because they associate |
    with bitwise OR. However:

    1.  Many other languages that have pattern matching use | (the list
        includes Elixir, Erlang, F#, Mathematica, OCaml, Ruby, Rust, and
        Scala).

    2.  | is shorter, which may contribute to the readability of nested
        patterns like Point(0|1, 0|1).

    3.  Some people mistakenly believe that | has the wrong priority;
        but since patterns don't support other operators it has the same
        priority as in expressions.

    4.  Python users use or very frequently, and may build an impression
        that it is strongly associated with Boolean short-circuiting.

    5.  | is used between alternatives in regular expressions and in
        EBNF grammars (like Python's own).

    6.  | not just used for bitwise OR -- it's used for set unions, dict
        merging (PEP 584) and is being considered as an alternative to
        typing.Union (PEP 604).

    7.  | works better as a visual separator, especially between
        strings. Compare:

            case "spam" or "eggs" or "cheese":

        to:

            case "spam" | "eggs" | "cheese":

Add an else clause

We decided not to add an else clause for several reasons.

-   It is redundant, since we already have case _:
-   There will forever be confusion about the indentation level of the
    else: -- should it align with the list of cases or with the match
    keyword?
-   Completionist arguments like "every other statement has one" are
    false -- only those statements have an else clause where it adds new
    functionality.

Deferred Ideas

There were a number of proposals to extend the matching syntax that we
decided to postpone for possible future PEP. These fall into the realm
of "cool idea but not essential", and it was felt that it might be
better to acquire some real-world data on how the match statement will
be used in practice before moving forward with some of these proposals.

Note that in each case, the idea was judged to be a "two-way door",
meaning that there should be no backwards-compatibility issues with
adding these features later.

One-off syntax variant

While inspecting some code-bases that may benefit the most from the
proposed syntax, it was found that single clause matches would be used
relatively often, mostly for various special-casing. In other languages
this is supported in the form of one-off matches. We proposed to support
such one-off matches too:

    if match value as pattern [and guard]:
        ...

or, alternatively, without the if:

    match value as pattern [if guard]:
        ...

as equivalent to the following expansion:

    match value:
        case pattern [if guard]:
            ...

To illustrate how this will benefit readability, consider this (slightly
simplified) snippet from real code:

    if isinstance(node, CallExpr):
        if (isinstance(node.callee, NameExpr) and len(node.args) == 1 and
                isinstance(node.args[0], NameExpr)):
            call = node.callee.name
            arg = node.args[0].name
            ...  # Continue special-casing 'call' and 'arg'
    ...  # Follow with common code

This can be rewritten in a more straightforward way as:

    if match node as CallExpr(callee=NameExpr(name=call), args=[NameExpr(name=arg)]):
        ...  # Continue special-casing 'call' and 'arg'
    ...  # Follow with common code

This one-off form would not allow elif match statements, as it was only
meant to handle a single pattern case. It was intended to be special
case of a match statement, not a special case of an if statement:

    if match value_1 as patter_1 [and guard_1]:
        ...
    elif match value_2 as pattern_2 [and guard_2]:  # Not allowed
        ...
    elif match value_3 as pattern_3 [and guard_3]:  # Not allowed
        ...
    else:  # Also not allowed
        ...

This would defeat the purpose of one-off matches as a complement to
exhaustive full matches - it's better and clearer to use a full match in
this case.

Similarly, if not match would not be allowed, since match ... as ... is
not an expression. Nor do we propose a while match construct present in
some languages with pattern matching, since although it may be handy, it
will likely be used rarely.

Other pattern-based constructions

Many other languages supporting pattern-matching use it as a basis for
multiple language constructs, including a matching operator, a
generalized form of assignment, a filter for loops, a method for
synchronizing communication, or specialized if statements. Some of these
were mentioned in the discussion of the first draft. Another question
asked was why this particular form (joining binding and conditional
selection) was chosen while other forms were not.

Introducing more uses of patterns would be too bold and premature given
the experience we have using patterns, and would make this proposal too
complicated. The statement as presented provides a form of the feature
that is sufficiently general to be useful while being self-contained,
and without having a massive impact on the syntax and semantics of the
language as a whole.

After some experience with this feature, the community may have a better
feeling for what other uses of pattern matching could be valuable in
Python.

Algebraic matching of repeated names

A technique occasionally seen in functional languages like Erlang and
Elixir is to use a match variable multiple times in the same pattern:

    match value:
        case Point(x, x):
            print("Point is on a diagonal!")

The idea here is that the first appearance of x would bind the value to
the name, and subsequent occurrences would verify that the incoming
value was equal to the value previously bound. If the value was not
equal, the match would fail.

However, there are a number of subtleties involved with mixing
load-store semantics for capture patterns. For the moment, we decided to
make repeated use of names within the same pattern an error; we can
always relax this restriction later without affecting backwards
compatibility.

Note that you can use the same name more than once in alternate choices:

    match value:
        case x | [x]:
            # etc.

Custom matching protocol

During the initial design discussions for this PEP, there were a lot of
ideas thrown around about custom matchers. There were a couple of
motivations for this:

-   Some classes might want to expose a different set of "matchable"
    names than the actual class properties.
-   Some classes might have properties that are expensive to calculate,
    and therefore shouldn't be evaluated unless the match pattern
    actually needed access to them.
-   There were ideas for exotic matchers such as IsInstance(),
    InRange(), RegexMatchingGroup() and so on.
-   In order for built-in types and standard library classes to be able
    to support matching in a reasonable and intuitive way, it was
    believed that these types would need to implement special matching
    logic.

These customized match behaviors would be controlled by a special
__match__ method on the class name. There were two competing variants:

-   A 'full-featured' match protocol which would pass in not only the
    subject to be matched, but detailed information about which
    attributes the specified pattern was interested in.
-   A simplified match protocol, which only passed in the subject value,
    and which returned a "proxy object" (which in most cases could be
    just the subject) containing the matchable attributes.

Here's an example of one version of the more complex protocol proposed:

    match expr:
        case BinaryOp(left=Number(value=x), op=op, right=Number(value=y)):
            ...

    from types import PatternObject
    BinaryOp.__match__(
        (),
        {
            "left": PatternObject(Number, (), {"value": ...}, -1, False),
            "op": ...,
            "right": PatternObject(Number, (), {"value": ...}, -1, False),
        },
        -1,
        False,
    )

One drawback of this protocol is that the arguments to __match__ would
be expensive to construct, and could not be pre-computed due to the fact
that, because of the way names are bound, there are no real constants in
Python. It also meant that the __match__ method would have to
re-implement much of the logic of matching which would otherwise be
implemented in C code in the Python VM. As a result, this option would
perform poorly compared to an equivalent if-statement.

The simpler protocol suffered from the fact that although it was more
performant, it was much less flexible, and did not allow for many of the
creative custom matchers that people were dreaming up.

Late in the design process, however, it was realized that the need for a
custom matching protocol was much less than anticipated. Virtually all
the realistic (as opposed to fanciful) uses cases brought up could be
handled by the built-in matching behavior, although in a few cases an
extra guard condition was required to get the desired effect.

Moreover, it turned out that none of the standard library classes really
needed any special matching support other than an appropriate
__match_args__ property.

The decision to postpone this feature came with a realization that this
is not a one-way door; that a more flexible and customizable matching
protocol can be added later, especially as we gain more experience with
real-world use cases and actual user needs.

The authors of this PEP expect that the match statement will evolve over
time as usage patterns and idioms evolve, in a way similar to what other
"multi-stage" PEPs have done in the past. When this happens, the
extended matching issue can be revisited.

Parameterized Matching Syntax

(Also known as "Class Instance Matchers".)

This is another variant of the "custom match classes" idea that would
allow diverse kinds of custom matchers mentioned in the previous section
-- however, instead of using an extended matching protocol, it would be
achieved by introducing an additional pattern type with its own syntax.
This pattern type would accept two distinct sets of parameters: one set
which consists of the actual parameters passed into the pattern object's
constructor, and another set representing the binding variables for the
pattern.

The __match__ method of these objects could use the constructor
parameter values in deciding what was a valid match.

This would allow patterns such as InRange<0, 6>(value), which would
match a number in the range 0..6 and assign the matched value to
'value'. Similarly, one could have a pattern which tests for the
existence of a named group in a regular expression match result
(different meaning of the word 'match').

Although there is some support for this idea, there was a lot of
bikeshedding on the syntax (there are not a lot of attractive options
available) and no clear consensus was reached, so it was decided that
for now, this feature is not essential to the PEP.

Pattern Utility Library

Both of the previous ideas would be accompanied by a new Python standard
library module which would contain a rich set of useful matchers.
However, it is not really possible to implement such a library without
adopting one of the extended pattern proposals given in the previous
sections, so this idea is also deferred.

Acknowledgments

We are grateful for the help of the following individuals (among many
others) for helping out during various phases of the writing of this
PEP:

-   Gregory P. Smith
-   Jim Jewett
-   Mark Shannon
-   Nate Lust
-   Taine Zhao

Version History

1.  Initial version
2.  Substantial rewrite, including:
    -   Minor clarifications, grammar and typo corrections
    -   Rename various concepts
    -   Additional discussion of rejected ideas, including:
        -   Why we choose _ for wildcard patterns
        -   Why we choose | for OR patterns
        -   Why we choose not to use special syntax for capture
            variables
        -   Why this pattern matching operation and not others
    -   Clarify exception and side effect semantics
    -   Clarify partial binding semantics
    -   Drop restriction on use of _ in load contexts
    -   Drop the default single positional argument being the whole
        subject except for a handful of built-in types
    -   Simplify behavior of __match_args__
    -   Drop the __match__ protocol (moved to deferred ideas)
    -   Drop ImpossibleMatchError exception
    -   Drop leading dot for loads (moved to deferred ideas)
    -   Reworked the initial sections (everything before syntax)
    -   Added an overview of all the types of patterns before the
        detailed description
    -   Added simplified syntax next to the description of each pattern
    -   Separate description of the wildcard from capture patterns
    -   Added Daniel F Moisset as sixth co-author

References

Appendix A -- Full Grammar

Here is the full grammar for match_stmt. This is an additional
alternative for compound_stmt. It should be understood that match and
case are soft keywords, i.e. they are not reserved words in other
grammatical contexts (including at the start of a line if there is no
colon where expected). By convention, hard keywords use single quotes
while soft keywords use double quotes.

Other notation used beyond standard EBNF:

-   SEP.RULE+ is shorthand for RULE (SEP RULE)*
-   !RULE is a negative lookahead assertion

    match_expr:
        | star_named_expression ',' star_named_expressions?
        | named_expression
    match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT
    case_block: "case" patterns [guard] ':' block
    guard: 'if' named_expression
    patterns: value_pattern ',' [values_pattern] | pattern
    pattern: walrus_pattern | or_pattern
    walrus_pattern: NAME ':=' or_pattern
    or_pattern: '|'.closed_pattern+
    closed_pattern:
        | capture_pattern
        | literal_pattern
        | constant_pattern
        | group_pattern
        | sequence_pattern
        | mapping_pattern
        | class_pattern
    capture_pattern: NAME !('.' | '(' | '=')
    literal_pattern:
        | signed_number !('+' | '-')
        | signed_number '+' NUMBER
        | signed_number '-' NUMBER
        | strings
        | 'None'
        | 'True'
        | 'False'
    constant_pattern: attr !('.' | '(' | '=')
    group_pattern: '(' patterns ')'
    sequence_pattern: '[' [values_pattern] ']' | '(' ')'
    mapping_pattern: '{' items_pattern? '}'
    class_pattern:
        | name_or_attr '(' ')'
        | name_or_attr '(' ','.pattern+ ','? ')'
        | name_or_attr '(' ','.keyword_pattern+ ','? ')'
        | name_or_attr '(' ','.pattern+ ',' ','.keyword_pattern+ ','? ')'
    signed_number: NUMBER | '-' NUMBER
    attr: name_or_attr '.' NAME
    name_or_attr: attr | NAME
    values_pattern: ','.value_pattern+ ','?
    items_pattern: ','.key_value_pattern+ ','?
    keyword_pattern: NAME '=' or_pattern
    value_pattern: '*' capture_pattern | pattern
    key_value_pattern:
        | (literal_pattern | constant_pattern) ':' or_pattern
        | '**' capture_pattern

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

[1] https://github.com/gvanrossum/patma/blob/master/EXAMPLES.md