PEP: 634 Title: Structural Pattern Matching: Specification Author:
Brandt Bucher <brandt@python.org>, Guido van Rossum <guido@python.org>
BDFL-Delegate: Discussions-To: python-dev@python.org Status: Final Type:
Standards Track Created: 12-Sep-2020 Python-Version: 3.10 Post-History:
22-Oct-2020, 08-Feb-2021 Replaces: 622 Resolution:
https://mail.python.org/archives/list/python-committers@python.org/message/SQC2FTLFV5A7DV7RCEAR2I2IKJKGK7W3

match

Abstract

This PEP provides the technical specification for the match statement.
It replaces PEP 622, which is hereby split in three parts:

-   PEP 634: Specification
-   PEP 635: Motivation and Rationale
-   PEP 636: Tutorial

This PEP is intentionally devoid of commentary; the motivation and all
explanations of our design choices are in PEP 635. First-time readers
are encouraged to start with PEP 636, which provides a gentler
introduction to the concepts, syntax and semantics of patterns.

Syntax and Semantics

See Appendix A for the complete grammar.

Overview and Terminology

The pattern matching process takes as input a pattern (following case)
and a subject value (following match). Phrases to describe the process
include "the pattern is matched with (or against) the subject value" and
"we match the pattern against (or with) the subject value".

The primary outcome of pattern matching is success or failure. In case
of success we may say "the pattern succeeds", "the match succeeds", or
"the pattern matches the subject value".

In many cases a pattern contains subpatterns, and success or failure is
determined by the success or failure of matching those subpatterns
against the value (e.g., for OR patterns) or against parts of the value
(e.g., for sequence patterns). This process typically processes the
subpatterns from left to right until the overall outcome is determined.
E.g., an OR pattern succeeds at the first succeeding subpattern, while a
sequence patterns fails at the first failing subpattern.

A secondary outcome of pattern matching may be one or more name
bindings. We may say "the pattern binds a value to a name". When
subpatterns tried until the first success, only the bindings due to the
successful subpattern are valid; when trying until the first failure,
the bindings are merged. Several more rules, explained below, apply to
these cases.

The Match Statement

Syntax:

    match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
    subject_expr:
        | star_named_expression ',' star_named_expressions?
        | named_expression
    case_block: "case" patterns [guard] ':' block
    guard: 'if' named_expression

The rules star_named_expression, star_named_expressions,
named_expression and block are part of the standard Python grammar.

The rule patterns is specified below.

For context, match_stmt is a new alternative for compound_statement:

    compound_statement:
        | if_stmt
        ...
        | match_stmt

The match and case keywords are soft keywords, i.e. they are not
reserved words in other grammatical contexts (including at the start of
a line if there is no colon where expected). This implies that they are
recognized as keywords when part of a match statement or case block
only, and are allowed to be used in all other contexts as variable or
argument names.

Match Semantics

The match statement first evaluates the subject expression. If a comma
is present a tuple is constructed using the standard rules.

The resulting subject value is then used to select the first case block
whose patterns succeeds matching it and whose guard condition (if
present) is "truthy". If no case blocks qualify the match statement is
complete; otherwise, the block of the selected case block is executed.
The usual rules for executing a block nested inside a compound statement
apply (e.g. an if statement).

Name bindings made during a successful pattern match outlive the
executed block and can be used after the match statement.

During failed pattern matches, some subpatterns may succeed. For
example, while matching the pattern (0, x, 1) with the value [0, 1, 2],
the subpattern x may succeed if the list elements are matched from left
to right. The implementation may choose to either make persistent
bindings for those partial matches or not. User code including a match
statement should not rely on the bindings being made for a failed match,
but also shouldn't assume that variables are unchanged by a failed
match. This part of the behavior is left intentionally unspecified so
different implementations can add optimizations, and to prevent
introducing semantic restrictions that could limit the extensibility of
this feature.

The precise pattern binding rules vary per pattern type and are
specified below.

Guards

If a guard is present on a case block, once the pattern or patterns in
the case block succeed, the expression in the guard is evaluated. If
this raises an exception, the exception bubbles up. Otherwise, if the
condition is "truthy" the case block is selected; if it is "falsy" the
case block is not selected.

Since guards are expressions they are allowed to have side effects.
Guard evaluation must proceed from the first to the last case block, one
at a time, skipping case blocks whose pattern(s) don't all succeed.
(I.e., even if determining whether those patterns succeed may happen out
of order, guard evaluation must happen in order.) Guard evaluation must
stop once a case block is selected.

Irrefutable case blocks

A pattern is considered irrefutable if we can prove from its syntax
alone that it will always succeed. In particular, capture patterns and
wildcard patterns are irrefutable, and so are AS patterns whose
left-hand side is irrefutable, OR patterns containing at least one
irrefutable pattern, and parenthesized irrefutable patterns.

A case block is considered irrefutable if it has no guard and its
pattern is irrefutable.

A match statement may have at most one irrefutable case block, and it
must be last.

Patterns

The top-level syntax for patterns is as follows:

    patterns: open_sequence_pattern | pattern
    pattern: as_pattern | or_pattern
    as_pattern: or_pattern 'as' capture_pattern
    or_pattern: '|'.closed_pattern+
    closed_pattern:
        | literal_pattern
        | capture_pattern
        | wildcard_pattern
        | value_pattern
        | group_pattern
        | sequence_pattern
        | mapping_pattern
        | class_pattern

AS Patterns

Syntax:

    as_pattern: or_pattern 'as' capture_pattern

(Note: the name on the right may not be _.)

An AS pattern matches the OR pattern on the left of the as keyword
against the subject. If this fails, the AS pattern fails. Otherwise, the
AS pattern binds the subject to the name on the right of the as keyword
and succeeds.

OR Patterns

Syntax:

    or_pattern: '|'.closed_pattern+

When two or more patterns are separated by vertical bars (|), this is
called an OR pattern. (A single closed pattern is just that.)

Only the final subpattern may be irrefutable.

Each subpattern must bind the same set of names.

An OR pattern matches each of its subpatterns in turn to the subject,
until one succeeds. The OR pattern is then deemed to succeed. If none of
the subpatterns succeed the OR pattern fails.

Literal Patterns

Syntax:

    literal_pattern:
        | signed_number
        | signed_number '+' NUMBER
        | signed_number '-' NUMBER
        | strings
        | 'None'
        | 'True'
        | 'False'
    signed_number: NUMBER | '-' NUMBER

The rule strings and the token NUMBER are defined in the standard Python
grammar.

Triple-quoted strings are supported. Raw strings and byte strings are
supported. F-strings are not supported.

The forms signed_number '+' NUMBER and signed_number '-' NUMBER are only
permitted to express complex numbers; they require a real number on the
left and an imaginary number on the right.

A literal pattern succeeds if the subject value compares equal to the
value expressed by the literal, using the following comparisons rules:

-   Numbers and strings are compared using the == operator.
-   The singleton literals None, True and False are compared using the
    is operator.

Capture Patterns

Syntax:

    capture_pattern: !"_" NAME

The single underscore (_) is not a capture pattern (this is what !"_"
expresses). It is treated as a wildcard pattern.

A capture pattern always succeeds. It binds the subject value to the
name using the scoping rules for name binding established for the walrus
operator in PEP 572. (Summary: the name becomes a local variable in the
closest containing function scope unless there's an applicable nonlocal
or global statement.)

In a given pattern, a given name may be bound only once. This disallows
for example case x, x: ... but allows case [x] | x: ....

Wildcard Pattern

Syntax:

    wildcard_pattern: "_"

A wildcard pattern always succeeds. It binds no name.

Value Patterns

Syntax:

    value_pattern: attr
    attr: name_or_attr '.' NAME
    name_or_attr: attr | NAME

The dotted name in the pattern is looked up using the standard Python
name resolution rules. However, when the same value pattern occurs
multiple times in the same match statement, the interpreter may cache
the first value found and reuse it, rather than repeat the same lookup.
(To clarify, this cache is strictly tied to a given execution of a given
match statement.)

The pattern succeeds if the value found thus compares equal to the
subject value (using the == operator).

Group Patterns

Syntax:

    group_pattern: '(' pattern ')'

(For the syntax of pattern, see Patterns above. Note that it contains no
comma -- a parenthesized series of items with at least one comma is a
sequence pattern, as is ().)

A parenthesized pattern has no additional syntax. It allows users to add
parentheses around patterns to emphasize the intended grouping.

Sequence Patterns

Syntax:

    sequence_pattern:
      | '[' [maybe_sequence_pattern] ']'
      | '(' [open_sequence_pattern] ')'
    open_sequence_pattern: maybe_star_pattern ',' [maybe_sequence_pattern]
    maybe_sequence_pattern: ','.maybe_star_pattern+ ','?
    maybe_star_pattern: star_pattern | pattern
    star_pattern: '*' (capture_pattern | wildcard_pattern)

(Note that a single parenthesized pattern without a trailing comma is a
group pattern, not a sequence pattern. However a single pattern enclosed
in [...] is still a sequence pattern.)

There is no semantic difference between a sequence pattern using [...],
a sequence pattern using (...), and an open sequence pattern.

A sequence pattern may contain at most one star subpattern. The star
subpattern may occur in any position. If no star subpattern is present,
the sequence pattern is a fixed-length sequence pattern; otherwise it is
a variable-length sequence pattern.

For a sequence pattern to succeed the subject must be a sequence, where
being a sequence is defined as its class being one of the following:

-   a class that inherits from collections.abc.Sequence
-   a Python class that has been registered as a
    collections.abc.Sequence
-   a builtin class that has its Py_TPFLAGS_SEQUENCE bit set
-   a class that inherits from any of the above (including classes
    defined before a parent's Sequence registration)

The following standard library classes will have their
Py_TPFLAGS_SEQUENCE bit set:

-   array.array
-   collections.deque
-   list
-   memoryview
-   range
-   tuple

Note

Although str, bytes, and bytearray are usually considered sequences,
they are not included in the above list and do not match sequence
patterns.

A fixed-length sequence pattern fails if the length of the subject
sequence is not equal to the number of subpatterns.

A variable-length sequence pattern fails if the length of the subject
sequence is less than the number of non-star subpatterns.

The length of the subject sequence is obtained using the builtin len()
function (i.e., via the __len__ protocol). However, the interpreter may
cache this value in a similar manner as described for value patterns.

A fixed-length sequence pattern matches the subpatterns to corresponding
items of the subject sequence, from left to right. Matching stops (with
a failure) as soon as a subpattern fails. If all subpatterns succeed in
matching their corresponding item, the sequence pattern succeeds.

A variable-length sequence pattern first matches the leading non-star
subpatterns to the corresponding items of the subject sequence, as for a
fixed-length sequence. If this succeeds, the star subpattern matches a
list formed of the remaining subject items, with items removed from the
end corresponding to the non-star subpatterns following the star
subpattern. The remaining non-star subpatterns are then matched to the
corresponding subject items, as for a fixed-length sequence.

Mapping Patterns

Syntax:

    mapping_pattern: '{' [items_pattern] '}'
    items_pattern: ','.key_value_pattern+ ','?
    key_value_pattern:
        | (literal_pattern | value_pattern) ':' pattern
        | double_star_pattern
    double_star_pattern: '**' capture_pattern

(Note that **_ is disallowed by this syntax.)

A mapping pattern may contain at most one double star pattern, and it
must be last.

A mapping pattern may not contain duplicate key values. (If all key
patterns are literal patterns this is considered a syntax error;
otherwise this is a runtime error and will raise ValueError.)

For a mapping pattern to succeed the subject must be a mapping, where
being a mapping is defined as its class being one of the following:

-   a class that inherits from collections.abc.Mapping
-   a Python class that has been registered as a collections.abc.Mapping
-   a builtin class that has its Py_TPFLAGS_MAPPING bit set
-   a class that inherits from any of the above (including classes
    defined before a parent's Mapping registration)

The standard library classes dict and mappingproxy will have their
Py_TPFLAGS_MAPPING bit set.

A mapping pattern succeeds if every key given in the mapping pattern is
present in the subject mapping, and the pattern for each key matches the
corresponding item of the subject mapping. Keys are always compared with
the == operator. If a '**' NAME form is present, that name is bound to a
dict containing remaining key-value pairs from the subject mapping.

If duplicate keys are detected in the mapping pattern, the pattern is
considered invalid, and a ValueError is raised.

Key-value pairs are matched using the two-argument form of the subject's
get() method. As a consequence, matched key-value pairs must already be
present in the mapping, and not created on-the-fly by __missing__ or
__getitem__. For example, collections.defaultdict instances will only be
matched by patterns with keys that were already present when the match
statement was entered.

Class Patterns

Syntax:

    class_pattern:
        | name_or_attr '(' [pattern_arguments ','?] ')'
    pattern_arguments:
        | positional_patterns [',' keyword_patterns]
        | keyword_patterns
    positional_patterns: ','.pattern+
    keyword_patterns: ','.keyword_pattern+
    keyword_pattern: NAME '=' pattern

A class pattern may not repeat the same keyword multiple times.

If name_or_attr is not an instance of the builtin type, TypeError is
raised.

A class pattern fails if the subject is not an instance of name_or_attr.
This is tested using isinstance().

If no arguments are present, the pattern succeeds if the isinstance()
check succeeds. Otherwise:

-   If only keyword patterns are present, they are processed as follows,
    one by one:
    -   The keyword is looked up as an attribute on the subject.
        -   If this raises an exception other than AttributeError, the
            exception bubbles up.
        -   If this raises AttributeError the class pattern fails.
        -   Otherwise, the subpattern associated with the keyword is
            matched against the attribute value. If this fails, the
            class pattern fails. If it succeeds, the match proceeds to
            the next keyword.
    -   If all keyword patterns succeed, the class pattern as a whole
        succeeds.
-   If any positional patterns are present, they are converted to
    keyword patterns (see below) and treated as additional keyword
    patterns, preceding the syntactic keyword patterns (if any).

Positional patterns are converted to keyword patterns using the
__match_args__ attribute on the class designated by name_or_attr, as
follows:

-   For a number of built-in types (specified below), a single
    positional subpattern is accepted which will match the entire
    subject. (Keyword patterns work as for other types here.)
-   The equivalent of getattr(cls, "__match_args__", ())) is called.
-   If this raises an exception the exception bubbles up.
-   If the returned value is not a tuple, the conversion fails and
    TypeError is raised.
-   If there are more positional patterns than the length of
    __match_args__ (as obtained using len()), TypeError is raised.
-   Otherwise, positional pattern i is converted to a keyword pattern
    using __match_args__[i] as the keyword, provided it the latter is a
    string; if it is not, TypeError is raised.
-   For duplicate keywords, TypeError is raised.

Once the positional patterns have been converted to keyword patterns,
the match proceeds as if there were only keyword patterns.

As mentioned above, for the following built-in types the handling of
positional subpatterns is different: bool, bytearray, bytes, dict,
float, frozenset, int, list, set, str, and tuple.

This behavior is roughly equivalent to the following:

    class C:
        __match_args__ = ("__match_self_prop__",)
        @property
        def __match_self_prop__(self):
            return self

Side Effects and Undefined Behavior

The only side-effect produced explicitly by the matching process is the
binding of names. However, the process relies on attribute access,
instance checks, len(), equality and item access on the subject and some
of its components. It also evaluates value patterns and the class name
of class patterns. While none of those typically create any
side-effects, in theory they could. This proposal intentionally leaves
out any specification of what methods are called or how many times. This
behavior is therefore undefined and user code should not rely on it.

Another undefined behavior is the binding of variables by capture
patterns that are followed (in the same case block) by another pattern
that fails. These may happen earlier or later depending on the
implementation strategy, the only constraint being that capture
variables must be set before guards that use them explicitly are
evaluated. If a guard consists of an and clause, evaluation of the
operands may even be interspersed with pattern matching, as long as
left-to-right evaluation order is maintained.

The Standard Library

To facilitate the use of pattern matching, several changes will be made
to the standard library:

-   Namedtuples and dataclasses will have auto-generated __match_args__.
-   For dataclasses the order of attributes in the generated
    __match_args__ will be the same as the order of corresponding
    arguments in the generated __init__() method. This includes the
    situations where attributes are inherited from a superclass. Fields
    with init=False are excluded from __match_args__.

In addition, a systematic effort will be put into going through existing
standard library classes and adding __match_args__ where it looks
beneficial.

Appendix A -- Full Grammar

Here is the full grammar for match_stmt. This is an additional
alternative for compound_stmt. Remember that match and case are soft
keywords, i.e. they are not reserved words in other grammatical contexts
(including at the start of a line if there is no colon where expected).
By convention, hard keywords use single quotes while soft keywords use
double quotes.

Other notation used beyond standard EBNF:

-   SEP.RULE+ is shorthand for RULE (SEP RULE)*
-   !RULE is a negative lookahead assertion

    match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
    subject_expr:
        | star_named_expression ',' [star_named_expressions]
        | named_expression
    case_block: "case" patterns [guard] ':' block
    guard: 'if' named_expression

    patterns: open_sequence_pattern | pattern
    pattern: as_pattern | or_pattern
    as_pattern: or_pattern 'as' capture_pattern
    or_pattern: '|'.closed_pattern+
    closed_pattern:
        | literal_pattern
        | capture_pattern
        | wildcard_pattern
        | value_pattern
        | group_pattern
        | sequence_pattern
        | mapping_pattern
        | class_pattern

    literal_pattern:
        | signed_number !('+' | '-')
        | signed_number '+' NUMBER
        | signed_number '-' NUMBER
        | strings
        | 'None'
        | 'True'
        | 'False'
    signed_number: NUMBER | '-' NUMBER

    capture_pattern: !"_" NAME !('.' | '(' | '=')

    wildcard_pattern: "_"

    value_pattern: attr !('.' | '(' | '=')
    attr: name_or_attr '.' NAME
    name_or_attr: attr | NAME

    group_pattern: '(' pattern ')'

    sequence_pattern:
      | '[' [maybe_sequence_pattern] ']'
      | '(' [open_sequence_pattern] ')'
    open_sequence_pattern: maybe_star_pattern ',' [maybe_sequence_pattern]
    maybe_sequence_pattern: ','.maybe_star_pattern+ ','?
    maybe_star_pattern: star_pattern | pattern
    star_pattern: '*' (capture_pattern | wildcard_pattern)

    mapping_pattern: '{' [items_pattern] '}'
    items_pattern: ','.key_value_pattern+ ','?
    key_value_pattern:
        | (literal_pattern | value_pattern) ':' pattern
        | double_star_pattern
    double_star_pattern: '**' capture_pattern

    class_pattern:
        | name_or_attr '(' [pattern_arguments ','?] ')'
    pattern_arguments:
        | positional_patterns [',' keyword_patterns]
        | keyword_patterns
    positional_patterns: ','.pattern+
    keyword_patterns: ','.keyword_pattern+
    keyword_pattern: NAME '=' pattern

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.