PEP: 642 Title: Explicit Pattern Syntax for Structural Pattern Matching
Version: $Revision$ Last-Modified: $Date$ Author: Alyssa Coghlan
<ncoghlan@gmail.com> BDFL-Delegate: Discussions-To:
python-dev@python.org Status: Rejected Type: Standards Track
Content-Type: text/x-rst Requires: 634 Created: 26-Sep-2020
Python-Version: 3.10 Post-History: 31-Oct-2020, 08-Nov-2020, 03-Jan-2021
Resolution:
https://mail.python.org/archives/list/python-dev@python.org/message/SQC2FTLFV5A7DV7RCEAR2I2IKJKGK7W3/

Abstract

This PEP covers an alternative syntax proposal for PEP 634's structural
pattern matching that requires explicit prefixes on all capture patterns
and value constraints. It also proposes a new dedicated syntax for
instance attribute patterns that aligns more closely with the proposed
mapping pattern syntax.

While the result is necessarily more verbose than the proposed syntax in
PEP 634, it is still significantly less verbose than the status quo.

As an example, the following match statement would extract "host" and
"port" details from a 2 item sequence, a mapping with "host" and "port"
keys, any object with "host" and "port" attributes, or a "host:port"
string, treating the "port" as optional in the latter three cases:

    port = DEFAULT_PORT
    match expr:
        case [as host, as port]:
            pass
        case {"host" as host, "port" as port}:
            pass
        case {"host" as host}:
            pass
        case object{.host as host, .port as port}:
            pass
        case object{.host as host}:
            pass
        case str{} as addr:
            host, __, optional_port = addr.partition(":")
            if optional_port:
                port = optional_port
        case __ as m:
            raise TypeError(f"Unknown address format: {m!r:.200}")
    port = int(port)

At a high level, this PEP proposes to categorise the different available
pattern types as follows:

-   wildcard pattern: __

-   group patterns: (PTRN)

-   

    value constraint patterns:

        -   equality constraints: == EXPR
        -   identity constraints: is EXPR

-   

    structural constraint patterns:

        -   sequence constraint patterns: [PTRN, as NAME, PTRN as NAME]
        -   mapping constraint patterns: {EXPR: PTRN, EXPR as NAME}
        -   instance attribute constraint patterns:
            CLS{.NAME, .NAME: PTRN, .NAME == EXPR, .NAME as NAME}
        -   class defined constraint patterns:
            CLS(PTRN, PTRN, **{.NAME, .NAME: PTRN, .NAME == EXPR, .NAME as NAME})

-   OR patterns: PTRN | PTRN | PTRN

-   AS patterns: PTRN as NAME (omitting the pattern implies __)

The intent of this approach is to:

-   allow an initial form of pattern matching to be developed and
    released without needing to decide up front on the best default
    options for handling bare names, attribute lookups, and literal
    values
-   ensure that pattern matching is defined explicitly at the Abstract
    Syntax Tree level, allowing the specifications of the semantics and
    the surface syntax for pattern matching to be clearly separated
-   define a clear and concise "ducktyping" syntax that could
    potentially be adopted in ordinary expressions as a way to more
    easily retrieve a tuple containing multiple attributes from the same
    object

Relative to PEP 634, the proposal also deliberately eliminates any
syntax that "binds to the right" without using the as keyword (using
capture patterns in PEP 634's mapping patterns and class patterns) or
binds to both the left and the right in the same pattern (using PEP
634's capture patterns with AS patterns)

Relationship with other PEPs

This PEP both depends on and competes with PEP 634 - the PEP author
agrees that match statements would be a sufficiently valuable addition
to the language to be worth the additional complexity that they add to
the learning process, but disagrees with the idea that "simple name vs
literal or attribute lookup" really offers an adequate syntactic
distinction between name binding and value lookup operations in match
patterns (at least for Python).

This PEP agrees with the spirit of PEP 640 (that the chosen wildcard
pattern to skip a name binding should be supported everywhere, not just
in match patterns), but is now proposing a different spelling for the
wildcard syntax (__ rather than ?). As such, it competes with PEP 640 as
written, but would complement a proposal to deprecate the use of __ as
an ordinary identifier and instead turn it into a general purpose
wildcard marker that always skips making a new local variable binding.

While it has not yet been put forward as a PEP, Mark Shannon has a
pre-PEP draft [1] expressing several concerns about the runtime
semantics of the pattern matching proposal in PEP 634. This PEP is
somewhat complementary to that one, as even though this PEP is mostly
about surface syntax changes rather than major semantic changes, it does
propose that the Abstract Syntax Tree definition be made more explicit
to better separate the details of the surface syntax from the semantics
of the code generation step. There is one specific idea in that pre-PEP
draft that this PEP explicitly rejects: the idea that the different
kinds of matching are mutually exclusive. It's entirely possible for the
same value to match different kinds of structural pattern, and which one
takes precedence will intentionally be governed by the order of the
cases in the match statement.

Motivation

The original PEP 622 (which was later split into PEP 634, PEP 635, and
PEP 636) incorporated an unstated but essential assumption in its syntax
design: that neither ordinary expressions nor the existing assignment
target syntax provide an adequate foundation for the syntax used in
match patterns.

While the PEP didn't explicitly state this assumption, one of the PEP
authors explained it clearly on python-dev[2]:

  The actual problem that I see is that we have different
  cultures/intuitions fundamentally clashing here. In particular, so
  many programmers welcome pattern matching as an "extended switch
  statement" and find it therefore strange that names are binding and
  not expressions for comparison. Others argue that it is at odds with
  current assignment statements, say, and question why dotted names are
  _/not/_ binding. What all groups seem to have in common, though, is
  that they refer to _/their/_ understanding and interpretation of the
  new match statement as 'consistent' or 'intuitive' --- naturally
  pointing out where we as PEP authors went wrong with our design.

  But here is the catch: at least in the Python world, pattern matching
  as proposed by this PEP is an unprecedented and new way of approaching
  a common problem. It is not simply an extension of something already
  there. Even worse: while designing the PEP we found that no matter
  from which angle you approach it, you will run into issues of seeming
  'inconsistencies' (which is to say that pattern matching cannot be
  reduced to a 'linear' extension of existing features in a meaningful
  way): there is always something that goes fundamentally beyond what is
  already there in Python. That's why I argue that arguments based on
  what is 'intuitive' or 'consistent' just do not make sense _/in this
  case/_.

The first iteration of this PEP was then born out of an attempt to show
that the second assertion was not accurate, and that match patterns
could be treated as a variation on assignment targets without leading to
inherent contradictions. (An earlier PR submitted to list this option in
the "Rejected Ideas" section of the original PEP 622 had previously been
declined[3]).

However, the review process for this PEP strongly suggested that not
only did the contradictions that Tobias mentioned in his email exist,
but they were also concerning enough to cast doubts on the syntax
proposal presented in PEP 634. Accordingly, this PEP was changed to go
even further than PEP 634, and largely abandon alignment between the
sequence matching syntax and the existing iterable unpacking syntax
(effectively answering "Not really, as least as far as the exact syntax
is concerned" to the first question raised in the DLS'20 paper [4]: "Can
we extend a feature like iterable unpacking to work for more general
object and data layouts?").

This resulted in a complete reversal of the goals of the PEP: rather
than attempting to emphasise the similarities between assignment and
pattern matching, the PEP now attempts to make sure that assignment
target syntax isn't being reused at all, reducing the likelihood of
incorrect inferences being drawn about the new construct based on
experience with existing ones.

Finally, before completing the 3rd iteration of the proposal (which
dropped inferred patterns entirely), the PEP author spent quite a bit of
time reflecting on the following entries in PEP 20:

-   Explicit is better than implicit.
-   Special cases aren't special enough to break the rules.
-   In the face of ambiguity, refuse the temptation to guess.

If we start with an explicit syntax, we can always add syntactic
shortcuts later (e.g. consider the recent proposals to add shortcuts for
Union and Optional type hints only after years of experience with the
original more verbose forms), while if we start out with only the
abbreviated forms, then we don't have any real way to revisit those
decisions in a future release.

Specification

This PEP retains the overall match/case statement structure and
semantics from PEP 634, but proposes multiple changes that mean that
user intent is explicitly specified in the concrete syntax rather than
needing to be inferred from the pattern matching context.

In the proposed Abstract Syntax Tree, the semantics are also always
explicit, with no inference required.

The Match Statement

Surface syntax:

    match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
    subject_expr:
        | star_named_expression ',' star_named_expressions?
        | named_expression
    case_block: "case" (guarded_pattern | open_pattern) ':' block

    guarded_pattern: closed_pattern 'if' named_expression

    open_pattern:
        | as_pattern
        | or_pattern

    closed_pattern:
        | wildcard_pattern
        | group_pattern
        | structural_constraint

Abstract syntax:

    Match(expr subject, match_case* cases)
    match_case = (pattern pattern, expr? guard, stmt* body)

The rules star_named_expression, star_named_expressions,
named_expression and block are part of the standard Python grammar.

Open patterns are patterns which consist of multiple tokens, and aren't
necessarily terminated by a closing delimiter (for example, __ as x,
int() | bool()). To avoid ambiguity for human readers, their usage is
restricted to top level patterns and to group patterns (which are
patterns surrounded by parentheses).

Closed patterns are patterns which either consist of a single token
(i.e. __), or else have a closing delimiter as a required part of their
syntax (e.g. [as x, as y], object{.x as x, .y as y}).

As in PEP 634, the match and case keywords are soft keywords, i.e. they
are not reserved words in other grammatical contexts (including at the
start of a line if there is no colon where expected). This means that
they are recognized as keywords when part of a match statement or case
block only, and are allowed to be used in all other contexts as variable
or argument names.

Unlike PEP 634, patterns are explicitly defined as a new kind of node in
the abstract syntax tree - even when surface syntax is shared with
existing expression nodes, a distinct abstract node is emitted by the
parser.

For context, match_stmt is a new alternative for compound_statement in
the surface syntax and Match is a new alternative for stmt in the
abstract syntax.

Match Semantics

This PEP largely retains the overall pattern matching semantics proposed
in PEP 634.

The proposed syntax for patterns changes significantly, and is discussed
in detail below.

There are also some proposed changes to the semantics of class defined
constraints (class patterns in PEP 634) to eliminate the need to special
case any builtin types (instead, the introduction of dedicated syntax
for instance attribute constraints allows the behaviour needed by those
builtin types to be specified as applying to any type that sets
__match_args__ to None)

Guards

This PEP retains the guard clause semantics proposed in PEP 634.

However, the syntax is changed slightly to require that when a guard
clause is present, the case pattern must be a closed pattern.

This makes it clearer to the reader where the pattern ends and the guard
clause begins. (This is mainly a potential problem with OR patterns,
where the guard clause looks kind of like the start of a conditional
expression in the final pattern. Actually doing that isn't legal syntax,
so there's no ambiguity as far as the compiler is concerned, but the
distinction may not be as clear to a human reader)

Irrefutable case blocks

The definition of irrefutable case blocks changes slightly in this PEP
relative to PEP 634, as capture patterns no longer exist as a separate
concept from AS patterns.

Aside from that caveat, the handling of irrefutable cases is the same as
in PEP 634:

-   wildcard patterns are irrefutable
-   AS patterns whose left-hand side is irrefutable
-   OR patterns containing at least one irrefutable pattern
-   parenthesized irrefutable patterns
-   a case block is considered irrefutable if it has no guard and its
    pattern is irrefutable.
-   a match statement may have at most one irrefutable case block, and
    it must be last.

Patterns

The top-level surface syntax for patterns is as follows:

    open_pattern: # Pattern may use multiple tokens with no closing delimiter
        | as_pattern
        | or_pattern

    as_pattern: [closed_pattern] pattern_as_clause

    or_pattern: '|'.simple_pattern+

    simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
        | closed_pattern
        | value_constraint

    closed_pattern: # Require a single token or a closing delimiter in pattern
        | wildcard_pattern
        | group_pattern
        | structural_constraint

As described above, the usage of open patterns is limited to top level
case clauses and when parenthesised in a group pattern.

The abstract syntax for patterns explicitly indicates which elements are
subpatterns and which elements are subexpressions or identifiers:

    pattern = MatchAlways
         | MatchValue(matchop op, expr value)
         | MatchSequence(pattern* patterns)
         | MatchMapping(expr* keys, pattern* patterns)
         | MatchAttrs(expr cls, identifier* attrs, pattern* patterns)
         | MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)

         | MatchRestOfSequence(identifier? target)
         -- A NULL entry in the MatchMapping key list handles capturing extra mapping keys

         | MatchAs(pattern? pattern, identifier target)
         | MatchOr(pattern* patterns)

AS Patterns

Surface syntax:

    as_pattern: [closed_pattern] pattern_as_clause
    pattern_as_clause: 'as' pattern_capture_target
    pattern_capture_target: !"__" NAME !('.' | '(' | '=')

(Note: the name on the right may not be __.)

Abstract syntax:

    MatchAs(pattern? pattern, identifier target)

An AS pattern matches the closed pattern on the left of the as keyword
against the subject. If this fails, the AS pattern fails. Otherwise, the
AS pattern binds the subject to the name on the right of the as keyword
and succeeds.

If no pattern to match is given, the wildcard pattern (__) is implied.

To avoid confusion with the wildcard pattern, the double underscore (__)
is not permitted as a capture target (this is what !"__" expresses).

A capture pattern always succeeds. It binds the subject value to the
name using the scoping rules for name binding established for named
expressions in PEP 572. (Summary: the name becomes a local variable in
the closest containing function scope unless there's an applicable
nonlocal or global statement.)

In a given pattern, a given name may be bound only once. This disallows
for example case [as x, as x]: ... but allows case [as x] | (as x):

As an open pattern, the usage of AS patterns is limited to top level
case clauses and when parenthesised in a group pattern. However, several
of the structural constraints allow the use of pattern_as_clause in
relevant locations to bind extracted elements of the matched subject to
local variables. These are mostly represented in the abstract syntax
tree as MatchAs nodes, aside from the dedicated MatchRestOfSequence node
in sequence patterns.

OR Patterns

Surface syntax:

    or_pattern: '|'.simple_pattern+

    simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
        | closed_pattern
        | value_constraint

Abstract syntax:

    MatchOr(pattern* patterns)

When two or more patterns are separated by vertical bars (|), this is
called an OR pattern. (A single simple pattern is just that)

Only the final subpattern may be irrefutable.

Each subpattern must bind the same set of names.

An OR pattern matches each of its subpatterns in turn to the subject,
until one succeeds. The OR pattern is then deemed to succeed. If none of
the subpatterns succeed the OR pattern fails.

Subpatterns are mostly required to be closed patterns, but the
parentheses may be omitted for value constraints.

Value constraints

Surface syntax:

    value_constraint:
        | eq_constraint
        | id_constraint

    eq_constraint: '==' closed_expr
    id_constraint: 'is' closed_expr

    closed_expr: # Require a single token or a closing delimiter in expression
        | primary
        | closed_factor

    closed_factor: # "factor" is the main grammar node for these unary ops
        | '+' primary
        | '-' primary
        | '~' primary

Abstract syntax:

    MatchValue(matchop op, expr value)
    matchop = EqCheck | IdCheck

The rule primary is defined in the standard Python grammar, and only
allows expressions that either consist of a single token, or else are
required to end with a closing delimiter.

Value constraints replace PEP 634's literal patterns and value patterns.

Equality constraints are written as == EXPR, while identity constraints
are written as is EXPR.

An equality constraint succeeds if the subject value compares equal to
the value given on the right, while an identity constraint succeeds only
if they are the exact same object.

The expressions to be compared against are largely restricted to either
single tokens (e.g. names, strings, numbers, builtin constants), or else
to expressions that are required to end with a closing delimiter.

The use of the high precedence unary operators is also permitted, as the
risk of perceived ambiguity is low, and being able to specify negative
numbers without parentheses is desirable.

When the same constraint expression occurs multiple times in the same
match statement, the interpreter may cache the first value calculated
and reuse it, rather than repeat the expression evaluation. (As for PEP
634 value patterns, this cache is strictly tied to a given execution of
a given match statement.)

Unlike literal patterns in PEP 634, this PEP requires that complex
literals be parenthesised to be accepted by the parser. See the Deferred
Ideas section for discussion on that point.

If this PEP were to be adopted in preference to PEP 634, then all
literal and value patterns would instead be written more explicitly as
value constraints:

    # Literal patterns
    match number:
        case == 0:
            print("Nothing")
        case == 1:
            print("Just one")
        case == 2:
            print("A couple")
        case == -1:
            print("One less than nothing")
        case == (1-1j):
            print("Good luck with that...")

    # Additional literal patterns
    match value:
        case == True:
            print("True or 1")
        case == False:
            print("False or 0")
        case == None:
            print("None")
        case == "Hello":
            print("Text 'Hello'")
        case == b"World!":
            print("Binary 'World!'")

    # Matching by identity rather than equality
    SENTINEL = object()
    match value:
        case is True:
            print("True, not 1")
        case is False:
            print("False, not 0")
        case is None:
            print("None, following PEP 8 comparison guidelines")
        case is ...:
            print("May be useful when writing __getitem__ methods?")
        case is SENTINEL:
            print("Matches the sentinel by identity, not just value")

    # Matching against variables and attributes
    from enum import Enum
    class Sides(str, Enum):
        SPAM = "Spam"
        EGGS = "eggs"
        ...

    preferred_side = Sides.EGGS
    match entree[-1]:
        case == Sides.SPAM:  # Compares entree[-1] == Sides.SPAM.
            response = "Have you got anything without Spam?"
        case == preferred_side:  # Compares entree[-1] == preferred_side
            response = f"Oh, I love {preferred_side}!"
        case as side:  # Assigns side = entree[-1].
            response = f"Well, could I have their Spam instead of the {side} then?"

Note the == preferred_side example: using an explicit prefix marker on
constraint expressions removes the restriction to only working with
attributes or literals for value lookups.

The == (1-1j) example illustrates the use of parentheses to turn any
subexpression into a closed one.

Wildcard Pattern

Surface syntax:

    wildcard_pattern: "__"

Abstract syntax:

    MatchAlways

A wildcard pattern always succeeds. As in PEP 634, it binds no name.

Where PEP 634 chooses the single underscore as its wildcard pattern for
consistency with other languages, this PEP chooses the double underscore
as that has a clearer path towards potentially being made consistent
across the entire language, whereas that path is blocked for "_" by i18n
related use cases.

Example usage:

    match sequence:
        case [__]:               # any sequence with a single element
            return True
        case [start, *__, end]:  # a sequence with at least two elements
            return start == end
        case __:                 # anything
            return False

Group Patterns

Surface syntax:

    group_pattern: '(' open_pattern ')'

For the syntax of open_pattern, see Patterns above.

A parenthesized pattern has no additional syntax and is not represented
in the abstract syntax tree. It allows users to add parentheses around
patterns to emphasize the intended grouping, and to allow nesting of
open patterns when the grammar requires a closed pattern.

Unlike PEP 634, there is no potential ambiguity with sequence patterns,
as this PEP requires that all sequence patterns be written with square
brackets.

Structural constraints

Surface syntax:

    structural_constraint:
        | sequence_constraint
        | mapping_constraint
        | attrs_constraint
        | class_constraint

Note: the separate "structural constraint" subcategory isn't used in the
abstract syntax tree, it's merely used as a convenient grouping node in
the surface syntax definition.

Structural constraints are patterns used to both make assertions about
complex objects and to extract values from them.

These patterns may all bind multiple values, either through the use of
nested AS patterns, or else through the use of pattern_as_clause
elements included in the definition of the pattern.

Sequence constraints

Surface syntax:

    sequence_constraint: '[' [sequence_constraint_elements] ']'
    sequence_constraint_elements: ','.sequence_constraint_element+ ','?
    sequence_constraint_element:
        | star_pattern
        | simple_pattern
        | pattern_as_clause
    star_pattern: '*' (pattern_as_clause | wildcard_pattern)

    simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
        | closed_pattern
        | value_constraint

    pattern_as_clause: 'as' pattern_capture_target

Abstract syntax:

    MatchSequence(pattern* patterns)

    MatchRestOfSequence(identifier? target)

Sequence constraints allow items within a sequence to be checked and
optionally extracted.

A sequence pattern fails if the subject value is not an instance of
collections.abc.Sequence. It also fails if the subject value is an
instance of str, bytes or bytearray (see Deferred Ideas for a discussion
on potentially removing the need for this special casing).

A sequence pattern may contain at most one star subpattern. The star
subpattern may occur in any position and is represented in the AST using
the MatchRestOfSequence node.

If no star subpattern is present, the sequence pattern is a fixed-length
sequence pattern; otherwise it is a variable-length sequence pattern.

A fixed-length sequence pattern fails if the length of the subject
sequence is not equal to the number of subpatterns.

A variable-length sequence pattern fails if the length of the subject
sequence is less than the number of non-star subpatterns.

The length of the subject sequence is obtained using the builtin len()
function (i.e., via the __len__ protocol). However, the interpreter may
cache this value in a similar manner as described for value constraint
expressions.

A fixed-length sequence pattern matches the subpatterns to corresponding
items of the subject sequence, from left to right. Matching stops (with
a failure) as soon as a subpattern fails. If all subpatterns succeed in
matching their corresponding item, the sequence pattern succeeds.

A variable-length sequence pattern first matches the leading non-star
subpatterns to the corresponding items of the subject sequence, as for a
fixed-length sequence. If this succeeds, the star subpattern matches a
list formed of the remaining subject items, with items removed from the
end corresponding to the non-star subpatterns following the star
subpattern. The remaining non-star subpatterns are then matched to the
corresponding subject items, as for a fixed-length sequence.

Subpatterns are mostly required to be closed patterns, but the
parentheses may be omitted for value constraints. Sequence elements may
also be captured unconditionally without parentheses.

Note: where PEP 634 allows all the same syntactic flexibility as
iterable unpacking in assignment statements, this PEP restricts sequence
patterns specifically to the square bracket form. Given that the open
and parenthesised forms are far more popular than square brackets for
iterable unpacking, this helps emphasise that iterable unpacking and
sequence matching are not the same operation. It also avoids the
parenthesised form's ambiguity problem between single element sequence
patterns and group patterns.

Mapping constraints

Surface syntax:

    mapping_constraint: '{' [mapping_constraint_elements] '}'
    mapping_constraint_elements: ','.key_value_constraint+ ','?
    key_value_constraint:
        | closed_expr pattern_as_clause
        | closed_expr ':' simple_pattern
        | double_star_capture
    double_star_capture: '**' pattern_as_clause

(Note that **__ is deliberately disallowed by this syntax, as additional
mapping entries are ignored by default)

closed_expr is defined above, under value constraints.

Abstract syntax:

    MatchMapping(expr* keys, pattern* patterns)

Mapping constraints allow keys and values within a sequence to be
checked and values to optionally be extracted.

A mapping pattern fails if the subject value is not an instance of
collections.abc.Mapping.

A mapping pattern succeeds if every key given in the mapping pattern is
present in the subject mapping, and the pattern for each key matches the
corresponding item of the subject mapping.

The presence of keys is checked using the two argument form of the get
method and a unique sentinel value, which offers the following benefits:

-   no exceptions need to be created in the lookup process
-   mappings that implement __missing__ (such as
    collections.defaultdict) only match on keys that they already
    contain, they don't implicitly add keys

A mapping pattern may not contain duplicate key values. If duplicate
keys are detected when checking the mapping pattern, the pattern is
considered invalid, and a ValueError is raised. While it would
theoretically be possible to checked for duplicated constant keys at
compile time, no such check is currently defined or implemented.

(Note: This semantic description is derived from the PEP 634 reference
implementation, which differs from the PEP 634 specification text at
time of writing. The implementation seems reasonable, so amending the
PEP text seems like the best way to resolve the discrepancy)

If a '**' as NAME double star pattern is present, that name is bound to
a dict containing any remaining key-value pairs from the subject mapping
(the dict will be empty if there are no additional key-value pairs).

A mapping pattern may contain at most one double star pattern, and it
must be last.

Value subpatterns are mostly required to be closed patterns, but the
parentheses may be omitted for value constraints (the : key/value
separator is still required to ensure the entry doesn't look like an
ordinary comparison operation).

Mapping values may also be captured unconditionally using the
KEY as NAME form, without either parentheses or the : key/value
separator.

Instance attribute constraints

Surface syntax:

    attrs_constraint:
        | name_or_attr '{' [attrs_constraint_elements] '}'
    attrs_constraint_elements: ','.attr_value_pattern+ ','?
    attr_value_pattern:
        | '.' NAME pattern_as_clause
        | '.' NAME value_constraint
        | '.' NAME ':' simple_pattern
        | '.' NAME

Abstract syntax:

    MatchAttrs(expr cls, identifier* attrs, pattern* patterns)

Instance attribute constraints allow an instance's type to be checked
and attributes to optionally be extracted.

An instance attribute constraint may not repeat the same attribute name
multiple times. Attempting to do so will result in a syntax error.

An instance attribute pattern fails if the subject is not an instance of
name_or_attr. This is tested using isinstance().

If name_or_attr is not an instance of the builtin type, TypeError is
raised.

If no attribute subpatterns are present, the constraint succeeds if the
isinstance() check succeeds. Otherwise:

  -   Each given attribute name is looked up as an attribute on the
      subject.
      -   If this raises an exception other than AttributeError, the
          exception bubbles up.
      -   If this raises AttributeError the constraint fails.
      -   Otherwise, the subpattern associated with the keyword is
          matched against the attribute value. If no subpattern is
          specified, the wildcard pattern is assumed. If this fails, the
          constraint fails. If it succeeds, the match proceeds to the
          next attribute.
  -   If all attribute subpatterns succeed, the constraint as a whole
      succeeds.

Instance attribute constraints allow ducktyping checks to be implemented
by using object as the required instance type (e.g.
case object{.host as host, .port as port}:).

The syntax being proposed here could potentially also be used as the
basis for a new syntax for retrieving multiple attributes from an object
instance in one assignment statement (e.g.
host, port = addr{.host, .port}). See the Deferred Ideas section for
further discussion of this point.

Class defined constraints

Surface syntax:

    class_constraint:
        | name_or_attr '(' ')'
        | name_or_attr '(' positional_patterns ','? ')'
        | name_or_attr '(' class_constraint_attrs ')'
        | name_or_attr '(' positional_patterns ',' class_constraint_attrs] ')'
    positional_patterns: ','.positional_pattern+
    positional_pattern:
        | simple_pattern
        | pattern_as_clause
    class_constraint_attrs:
        | '**' '{' [attrs_constraint_elements] '}'

Abstract syntax:

    MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)

Class defined constraints allow a sequence of common attributes to be
specified on a class and checked positionally, rather than needing to
specify the attribute names in every related match pattern.

As for instance attribute patterns:

-   a class defined pattern fails if the subject is not an instance of
    name_or_attr. This is tested using isinstance().
-   if name_or_attr is not an instance of the builtin type, TypeError is
    raised.

Regardless of whether or not any arguments are present, the subject is
checked for a __match_args__ attribute using the equivalent of
getattr(cls, "__match_args__", _SENTINEL)).

If this raises an exception the exception bubbles up.

If the returned value is not a list, tuple, or None, the conversion
fails and TypeError is raised at runtime.

This means that only types that actually define __match_args__ will be
usable in class defined patterns. Types that don't define __match_args__
will still be usable in instance attribute patterns.

If __match_args__ is None, then only a single positional subpattern is
permitted. Attempting to specify additional attribute patterns either
positionally or using the double star syntax will cause TypeError to be
raised at runtime.

This positional subpattern is then matched against the entire subject,
allowing a type check to be combined with another match pattern (e.g.
checking both the type and contents of a container, or the type and
value of a number).

If __match_args__ is a list or tuple, then the class defined constraint
is converted to an instance attributes constraint as follows:

-   if only the double star attribute constraints subpattern is present,
    matching proceeds as if for the equivalent instance attributes
    constraint.
-   if there are more positional subpatterns than the length of
    __match_args__ (as obtained using len()), TypeError is raised.
-   Otherwise, positional pattern i is converted to an attribute pattern
    using __match_args__[i] as the attribute name.
-   if any element in __match_args__ is not a string, TypeError is
    raised.
-   once the positional patterns have been converted to attribute
    patterns, then they are combined with any attribute constraints
    given in the double star attribute constraints subpattern, and
    matching proceeds as if for the equivalent instance attributes
    constraint.

Note: the __match_args__ is None handling in this PEP replaces the
special casing of bool, bytearray, bytes, dict, float, frozenset, int,
list, set, str, and tuple in PEP 634. However, the optimised fast path
for those types is retained in the implementation.

Design Discussion

Requiring explicit qualification of simple names in match patterns

The first iteration of this PEP accepted the basic premise of PEP 634
that iterable unpacking syntax would provide a good foundation for
defining a new syntax for pattern matching.

During the review process, however, two major and one minor ambiguity
problems were highlighted that arise directly from that core assumption:

-   most problematically, when binding simple names by default is
    extended to PEP 634's proposed class pattern syntax, the
    ATTR=TARGET_NAME construct binds to the right without using the as
    keyword, and uses the normal assignment-to-the-left sigil (=) to do
    it!
-   when binding simple names by default is extended to PEP 634's
    proposed mapping pattern syntax, the KEY: TARGET_NAME construct
    binds to the right without using the as keyword
-   using a PEP 634 capture pattern together with an AS pattern
    (TARGET_NAME_1 as TARGET_NAME_2) gives an odd "binds to both the
    left and right" behaviour

The third revision of this PEP accounted for this problem by abandoning
the alignment with iterable unpacking syntax, and instead requiring that
all uses of bare simple names for anything other than a variable lookup
be qualified by a preceding sigil or keyword:

-   as NAME: local variable binding
-   .NAME: attribute lookup
-   == NAME: variable lookup
-   is NAME: variable lookup
-   any other usage: variable lookup

The key benefit of this approach is that it makes interpretation of
simple names in patterns a local activity: a leading as indicates a name
binding, a leading . indicates an attribute lookup, and anything else is
a variable lookup (regardless of whether we're reading a subpattern or a
subexpression).

With the syntax now proposed in this PEP, the problematic cases
identified above no longer read poorly:

-   .ATTR as TARGET_NAME is more obviously a binding than
    ATTR=TARGET_NAME
-   KEY as TARGET_NAME is more obviously a binding than KEY: TARGET_NAME
-   (as TARGET_NAME_1) as TARGET_NAME_2 is more obviously two bindings
    than TARGET_NAME_1 as TARGET_NAME_2

Resisting the temptation to guess

PEP 635 looks at the way pattern matching is used in other languages,
and attempts to use that information to make plausible predictions about
the way pattern matching will be used in Python:

-   wanting to extract values to local names will probably be more
    common than wanting to match against values stored in local names
-   wanting comparison by equality will probably be more common than
    wanting comparison by identity
-   users will probably be able to at least remember that bare names
    bind values and attribute references look up values, even if they
    can't figure that out for themselves without reading the
    documentation or having someone tell them

To be clear, I think these predictions actually are plausible. However,
I also don't think we need to guess about this up front: I think we can
start out with a more explicit syntax that requires users to state their
intent using a prefix marker (either as, ==, or is), and then reassess
the situation in a few years based on how pattern matching is actually
being used in Python.

At that point, we'll be able to choose amongst at least the following
options:

-   deciding the explicit syntax is concise enough, and not changing
    anything
-   adding inferred identity constraints for one or more of None, ...,
    True and False
-   adding inferred equality constraints for other literals (potentially
    including complex literals)
-   adding inferred equality constraints for attribute lookups
-   adding either inferred equality constraints or inferred capture
    patterns for bare names

All of those ideas could be considered independently on their own
merits, rather than being a potential barrier to introducing pattern
matching in the first place.

If any of these syntactic shortcuts were to eventually be introduced,
they'd also be straightforward to explain in terms of the underlying
more explicit syntax (the leading as, ==, or is would just be getting
inferred by the parser, without the user needing to provide it
explicitly). At the implementation level, only the parser should need to
be change, as the existing AST nodes could be reused.

Interaction with caching of attribute lookups in local variables

One of the major changes between this PEP and PEP 634 is to use == EXPR
for equality constraint lookups, rather than only offering NAME.ATTR.
The original motivation for this was to avoid the semantic conflict with
regular assignment targets, where NAME.ATTR is already used in
assignment statements to set attributes, so if NAME.ATTR were the only
syntax for symbolic value matching, then we're pre-emptively ruling out
any future attempts to allow matching against single patterns using the
existing assignment statement syntax. The current motivation is more
about the general desire to avoid guessing about user's intent, and
instead requiring them to state it explicitly in the syntax.

However, even within match statements themselves, the name.attr syntax
for value patterns has an undesirable interaction with local variable
assignment, where routine refactorings that would be semantically
neutral for any other Python statement introduce a major semantic change
when applied to a PEP 634 style match statement.

Consider the following code:

    while value < self.limit:
        ... # Some code that adjusts "value"

The attribute lookup can be safely lifted out of the loop and only
performed once:

    _limit = self.limit:
    while value < _limit:
        ... # Some code that adjusts "value"

With the marker prefix based syntax proposal in this PEP, value
constraints would be similarly tolerant of match patterns being
refactored to use a local variable instead of an attribute lookup, with
the following two statements being functionally equivalent:

    match expr:
        case {"key": == self.target}:
            ... # Handle the case where 'expr["key"] == self.target'
        case __:
            ... # Handle the non-matching case

    _target = self.target
    match expr:
        case {"key": == _target}:
            ... # Handle the case where 'expr["key"] == self.target'
        case __:
            ... # Handle the non-matching case

By contrast, when using PEP 634's value and capture pattern syntaxes
that omit the marker prefix, the following two statements wouldn't be
equivalent at all:

    # PEP 634's value pattern syntax
    match expr:
        case {"key": self.target}:
            ... # Handle the case where 'expr["key"] == self.target'
        case _:
            ... # Handle the non-matching case

    # PEP 634's capture pattern syntax
    _target = self.target
    match expr:
        case {"key": _target}:
            ... # Matches any mapping with "key", binding its value to _target
        case _:
            ... # Handle the non-matching case

This PEP ensures the original semantics are retained under this style of
simplistic refactoring: use == name to force interpretation of the
result as a value constraint, use as name for a name binding.

PEP 634's proposal to offer only the shorthand syntax, with no
explicitly prefixed form, means that the primary answer on offer is
"Well, don't do that, then, only compare against attributes in
namespaces, don't compare against simple names".

PEP 622's walrus pattern syntax had another odd interaction where it
might not bind the same object as the exact same walrus expression in
the body of the case clause, but PEP 634 fixed that discrepancy by
replacing walrus patterns with AS patterns (where the fact that the
value bound to the name on the RHS might not be the same value as
returned by the LHS is a standard feature common to all uses of the "as"
keyword).

Using existing comparison operators as the value constraint prefix

If the benefit of a dedicated value constraint prefix is accepted, then
the next question is to ask exactly what that prefix should be.

The initially published version of this PEP proposed using the
previously unused ? symbol as the prefix for equality constraints, and
?is as the prefix for identity constraints. When reviewing the PEP,
Steven D'Aprano presented a compelling counterproposal[5] to use the
existing comparison operators (== and is) instead.

There were a few concerns with == as a prefix that kept it from being
chosen as the prefix in the initial iteration of the PEP:

-   for common use cases, it's even more visually noisy than ?, as a lot
    of folks with PEP 8 trained aesthetic sensibilities are going to
    want to put a space between it and the following expression,
    effectively making it a 3 character prefix instead of 1
-   when used in a mapping pattern, there needs to be a space between
    the : key/value separator and the == prefix, or the tokeniser will
    split them up incorrectly (getting := and = instead of : and ==)
-   when used in an OR pattern, there needs to be a space between the |
    pattern separator and the == prefix, or the tokeniser will split
    them up incorrectly (getting |= and = instead of | and ==)
-   if used in a PEP 634 style class pattern, there needs to be a space
    between the = keyword separator and the == prefix, or the tokeniser
    will split them up incorrectly (getting == and = instead of = and
    ==)

Rather than introducing a completely new symbol, Steven's proposed
resolution to this verbosity problem was to retain the ability to omit
the prefix marker in syntactically unambiguous cases.

While the idea of omitting the prefix marker was accepted for the second
revision of the proposal, it was dropped again in the third revision due
to ambiguity concerns. Instead, the following points apply:

-   for class patterns, other syntax changes allow equality constraints
    to be written as .ATTR == EXPR, and identity constraints to be
    written as .ATTR is EXPR, both of which are quite easy to read
-   for mapping patterns, the extra syntactic noise is just tolerated
    (at least for now)
-   for OR patterns, the extra syntactic noise is just tolerated (at
    least for now). However, membership constraints may offer a future
    path to reducing the need to combine OR patterns with equality
    constraints (instead, the values to be checked against would be
    collected as a set, list, or tuple).

Given that perspective, PEP 635's arguments against using ? as part of
the pattern matching syntax held for this proposal as well, and so the
PEP was amended accordingly.

Using __ as the wildcard pattern marker

PEP 635 makes a solid case that introducing ? solely as a wildcard
pattern marker would be a bad idea. With the syntax for value
constraints changed to use existing comparison operations rather than ?
and ?is, that argument holds for this PEP as well.

However, as noted by Thomas Wouters in[6], PEP 634's choice of _ remains
problematic as it would likely mean that match patterns would have a
permanent difference from all other parts of Python - the use of _ in
software internationalisation and at the interactive prompt means that
there isn't really a plausible path towards using it as a general
purpose "skipped binding" marker.

__ is an alternative "this value is not needed" marker drawn from a
Stack Overflow answer[7] (originally posted by the author of this PEP)
on the various meanings of _ in existing Python code.

This PEP also proposes adopting an implementation technique that limits
the scope of the associated special casing of __ to the parser: defining
a new AST node type (MatchAlways) specifically for wildcard markers,
rather than passing it through to the AST as a Name node.

Within the parser, __ still means either a regular name or a wildcard
marker in a match pattern depending on where you were in the parse tree,
but within the rest of the compiler, Name("__") is still a normal
variable name, while MatchAlways() is always a wildcard marker in a
match pattern.

Unlike _, the lack of other use cases for __ means that there would be a
plausible path towards restoring identifier handling consistency with
the rest of the language by making __ mean "skip this name binding"
everywhere in Python:

-   in the interpreter itself, deprecate loading variables with the name
    __. This would make reading from __ emit a deprecation warning,
    while writing to it would initially be unchanged. To avoid slowing
    down all name loads, this could be handled by having the compiler
    emit additional code for the deprecated name, rather than using a
    runtime check in the standard name loading opcodes.
-   after a suitable number of releases, change the parser to emit a new
    SkippedBinding AST node for all uses of __ as an assignment target,
    and update the rest of the compiler accordingly
-   consider making __ a true hard keyword rather than a soft keyword

This deprecation path couldn't be followed for _, as there's no way for
the interpreter to distinguish between attempts to read back _ when
nominally used as a "don't care" marker, and legitimate reads of _ as
either an i18n text translation function or as the last statement result
at the interactive prompt.

Names starting with double-underscores are also already reserved for use
by the language, whether that is for compile time constants (i.e.
__debug__), special methods, or class attribute name mangling, so using
__ here would be consistent with that existing approach.

Representing patterns explicitly in the Abstract Syntax Tree

PEP 634 doesn't explicitly discuss how match statements should be
represented in the Abstract Syntax Tree, instead leaving that detail to
be defined as part of the implementation.

As a result, while the reference implementation of PEP 634 definitely
works (and formed the basis of the reference implementation of this
PEP), it does contain a significant design flaw: despite the notes in
PEP 635 that patterns should be considered as distinct from expressions,
the reference implementation goes ahead and represents them in the AST
as expression nodes.

The result is an AST that isn't very abstract at all: nodes that should
be compiled completely differently (because they're patterns rather than
expressions) are represented the same way, and the type system of the
implementation language (e.g. C for CPython) can't offer any assistance
in keeping track of which subnodes should be ordinary expressions and
which should be subpatterns.

Rather than continuing with that approach, this PEP has instead defined
a new explicit "pattern" node in the AST, which allows the patterns and
their permitted subnodes to be defined explicitly in the AST itself,
making the code implementing the new feature clearer, and allowing the C
compiler to provide more assistance in keeping track of when the code
generator is dealing with patterns or expressions.

This change in implementation approach is actually orthogonal to the
surface syntax changes proposed in this PEP, so it could still be
adopted even if the rest of the PEP were to be rejected.

Changes to sequence patterns

This PEP makes one notable change to sequence patterns relative to PEP
634:

-   only the square bracket form of sequence pattern is supported.
    Neither open (no delimiters) nor tuple style (parentheses as
    delimiters) sequence patterns are supported.

Relative to PEP 634, sequence patterns are also significantly affected
by the change to require explicit qualification of capture patterns and
value constraints, as it means case [a, b, c]: must instead be written
as case [as a, as b, as c]: and case [0, 1]: must instead be written as
case [== 0, == 1]:.

With the syntax for sequence patterns no longer being derived directly
from the syntax for iterable unpacking, it no longer made sense to keep
the syntactic flexibility that had been included in the original syntax
proposal purely for consistency with iterable unpacking.

Allowing open and tuple style sequence patterns didn't increase
expressivity, only ambiguity of intent (especially relative to group
patterns), and encouraged readers down the path of viewing pattern
matching syntax as intrinsically linked to assignment target syntax
(which the PEP 634 authors have stated multiple times is not a desirable
path to have readers take, and a view the author of this PEP now shares,
despite disagreeing with it originally).

Changes to mapping patterns

This PEP makes two notable changes to mapping patterns relative to PEP
634:

-   value capturing is written as KEY as NAME rather than as KEY: NAME
-   a wider range of keys are permitted: any "closed expression", rather
    than only literals and attribute references

As discussed above, the first change is part of ensuring that all
binding operations with the target name to the right of a subexpression
or pattern use the as keyword.

The second change is mostly a matter of simplifying the parser and code
generator code by reusing the existing expression handling machinery.
The restriction to closed expressions is designed to help reduce
ambiguity as to where the key expression ends and the match pattern
begins. This mostly allows a superset of what PEP 634 allows, except
that complex literals must be written in parentheses (at least for now).

Adapting PEP 635's mapping pattern examples to the syntax proposed in
this PEP:

    match json_pet:
        case {"type": == "cat", "name" as name, "pattern" as pattern}:
            return Cat(name, pattern)
        case {"type": == "dog", "name" as name, "breed" as breed}:
            return Dog(name, breed)
        case __:
            raise ValueError("Not a suitable pet")

    def change_red_to_blue(json_obj):
        match json_obj:
            case { 'color': (== 'red' | == '#FF0000') }:
                json_obj['color'] = 'blue'
            case { 'children' as children }:
                for child in children:
                    change_red_to_blue(child)

For reference, the equivalent PEP 634 syntax:

    match json_pet:
        case {"type": "cat", "name": name, "pattern": pattern}:
            return Cat(name, pattern)
        case {"type": "dog", "name": name, "breed": breed}:
            return Dog(name, breed)
        case _:
            raise ValueError("Not a suitable pet")

    def change_red_to_blue(json_obj):
        match json_obj:
            case { 'color': ('red' | '#FF0000') }:
                json_obj['color'] = 'blue'
            case { 'children': children }:
                for child in children:
                    change_red_to_blue(child)

Changes to class patterns

This PEP makes several notable changes to class patterns relative to PEP
634:

-   the syntactic alignment with class instantiation is abandoned as
    being actively misleading and unhelpful. Instead, a new dedicated
    syntax for checking additional attributes is introduced that draws
    inspiration from mapping patterns rather than class instantiation
-   a new dedicated syntax for simple ducktyping that will work for any
    class is introduced
-   the special casing of various builtin and standard library types is
    supplemented by a general check for the existence of a
    __match_args__ attribute with the value of None

As discussed above, the first change has two purposes:

-   it's part of ensuring that all binding operations with the target
    name to the right of a subexpression or pattern use the as keyword.
    Using = to assign to the right is particularly problematic.
-   it's part of ensuring that all uses of simple names in patterns have
    a prefix that indicates their purpose (in this case, a leading . to
    indicate an attribute lookup)

The syntactic alignment with class instantion was also judged to be
unhelpful in general, as class patterns are about matching patterns
against attributes, while class instantiation is about matching call
arguments to parameters in class constructors, which may not bear much
resemblance to the resulting instance attributes at all.

The second change is intended to make it easier to use pattern matching
for the "ducktyping" style checks that are already common in Python.

The concrete syntax proposal for these patterns then arose from viewing
instances as mappings of attribute names to values, and combining the
attribute lookup syntax (.ATTR), with the mapping pattern syntax
{KEY: PATTERN} to give cls{.ATTR: PATTERN}.

Allowing cls{.ATTR} to mean the same thing as cls{.ATTR: __} was a
matter of considering the leading . sufficient to render the name usage
unambiguous (it's clearly an attribute reference, whereas matching
against a variable key in a mapping pattern would be arguably ambiguous)

The final change just supplements a CPython-internal-only check in the
PEP 634 reference implementation by making it the default behaviour that
classes get if they don't define __match_args__ (the optimised fast path
for the builtin and standard library types named in PEP 634 is
retained).

Adapting the class matching example linked from PEP 635 shows that for
purely positional class matching, the main impact comes from the changes
to value constraints and name binding, not from the class matching
changes:

    match expr:
        case BinaryOp(== '+', as left, as right):
            return eval_expr(left) + eval_expr(right)
        case BinaryOp(== '-', as left, as right):
            return eval_expr(left) - eval_expr(right)
        case BinaryOp(== '*', as left, as right):
            return eval_expr(left) * eval_expr(right)
        case BinaryOp(== '/', as left, as right):
            return eval_expr(left) / eval_expr(right)
        case UnaryOp(== '+', as arg):
            return eval_expr(arg)
        case UnaryOp(== '-', as arg):
            return -eval_expr(arg)
        case VarExpr(as name):
            raise ValueError(f"Unknown value of: {name}")
        case float() | int():
            return expr
        case __:
            raise ValueError(f"Invalid expression value: {repr(expr)}")

For reference, the equivalent PEP 634 syntax:

    match expr:
        case BinaryOp('+', left, right):
            return eval_expr(left) + eval_expr(right)
        case BinaryOp('-', left, right):
            return eval_expr(left) - eval_expr(right)
        case BinaryOp('*', left, right):
            return eval_expr(left) * eval_expr(right)
        case BinaryOp('/', left, right):
            return eval_expr(left) / eval_expr(right)
        case UnaryOp('+', arg):
            return eval_expr(arg)
        case UnaryOp('-', arg):
            return -eval_expr(arg)
        case VarExpr(name):
            raise ValueError(f"Unknown value of: {name}")
        case float() | int():
            return expr
        case _:
            raise ValueError(f"Invalid expression value: {repr(expr)}")

The changes to the class pattern syntax itself are more relevant when
checking for named attributes and extracting their values without
relying on __match_args__:

    match expr:
        case object{.host as host, .port as port}:
            pass
        case object{.host as host}:
            pass

Compare this to the PEP 634 equivalent, where it really isn't clear
which names are referring to attributes of the match subject and which
names are referring to local variables:

    match expr:
        case object(host=host, port=port):
            pass
        case object(host=host):
            pass

In this specific case, that ambiguity doesn't matter (since the
attribute and variable names are the same), but in the general case,
knowing which is which will be critical to reasoning correctly about the
code being read.

Deferred Ideas

Inferred value constraints

As discussed above, this PEP doesn't rule out the possibility of adding
inferred equality and identity constraints in the future.

These could be particularly valuable for literals, as it is quite likely
that many "magic" strings and numbers with self-evident meanings will be
written directly into match patterns, rather than being stored in named
variables. (Think constants like None, or obviously special numbers like
0 and 1, or strings where their contents are as descriptive as any
variable name, rather than cryptic checks against opaque numbers like
739452)

Making some required parentheses optional

The PEP currently errs heavily on the side of requiring parentheses in
the face of potential ambiguity.

However, there are a number of cases where it at least arguably goes too
far, mostly involving AS patterns with an explicit pattern.

In any position that requires a closed pattern, AS patterns may end up
starting with doubled parentheses, as the nested pattern is also
required to be a closed pattern: ((OPEN PTRN) as NAME)

Due to the requirement that the subpattern be closed, it should be
reasonable in many of these cases (e.g. sequence pattern subpatterns) to
accept CLOSED_PTRN as NAME directly.

Further consideration of this point has been deferred, as making
required parentheses optional is a backwards compatible change, and
hence relaxing the restrictions later can be considered on a
case-by-case basis.

Accepting complex literals as closed expressions

PEP 634's reference implementation includes a lot of special casing of
binary operations in both the parser and the rest of the compiler in
order to accept complex literals without accepting arbitrary binary
numeric operations on literal values.

Ideally, this problem would be dealt with at the parser layer, with the
parser directly emitting a Constant AST node prepopulated with a complex
number. If that was the way things worked, then complex literals could
be accepted through a similar mechanism to any other literal.

This isn't how complex literals are handled, however. Instead, they're
passed through to the AST as regular BinOp nodes, and then the constant
folding pass on the AST resolves them down to Constant nodes with a
complex value.

For the parser to resolve complex literals directly, the compiler would
need to be able to tell the tokenizer to generate a distinct token type
for imaginary numbers (e.g. INUMBER), which would then allow the parser
to handle NUMBER + INUMBER and NUMBER - INUMBER separately from other
binary operations.

Alternatively, a new ComplexNumber AST node type could be defined, which
would allow the parser to notify the subsequent compiler stages that a
particular node should specifically be a complex literal, rather than an
arbitrary binary operation. Then the parser could accept NUMBER + NUMBER
and NUMBER - NUMBER for that node, while letting the AST validation for
ComplexNumber take care of ensuring that the real and imaginary parts of
the literal were real and imaginary numbers as expected.

For now, this PEP has postponed dealing with this question, and instead
just requires that complex literals be parenthesised in order to be used
in value constraints and as mapping pattern keys.

Allowing negated constraints in match patterns

With the syntax proposed in this PEP, it isn't permitted to write
!= expr or is not expr as a match pattern.

Both of these forms have clear potential interpretations as a negated
equality constraint (i.e. x != expr) and a negated identity constraint
(i.e. x is not expr).

However, it's far from clear either form would come up often enough to
justify the dedicated syntax, so the possible extension has been
deferred pending further community experience with match statements.

Allowing membership checks in match patterns

The syntax used for equality and identity constraints would be
straightforward to extend to membership checks: in container.

One downside of the proposals in both this PEP and PEP 634 is that
checking for multiple values in the same case doesn't look like any
existing container membership check in Python:

    # PEP 634's literal patterns
    match value:
        case 0 | 1 | 2 | 3:
            ...

    # This PEP's equality constraints
    match value:
        case == 0 | == 1 | == 2 | == 3:
            ...

Allowing inferred equality constraints under this PEP would only make it
look like the PEP 634 example, it still wouldn't look like the
equivalent if statement header (if value in {0, 1, 2, 3}:).

Membership constraints would provide a more explicit, but still concise,
way to check if the match subject was present in a container, and it
would look the same as an ordinary containment check:

    match value:
        case in {0, 1, 2, 3}:
            ...
        case in {one, two, three, four}:
            ...
        case in range(4): # It would accept any container, not just literal sets
            ...

Such a feature would also be readily extensible to allow all kinds of
case clauses without any further syntax updates, simply by defining
__contains__ appropriately on a custom class definition.

However, while this does seem like a useful extension, and a good way to
resolve this PEP's verbosity problem when combining multiple equality
checks in an OR pattern, it isn't essential to making match statements a
valuable addition to the language, so it seems more appropriate to defer
it to a separate proposal, rather than including it here.

Inferring a default type for instance attribute constraints

The dedicated syntax for instance attribute constraints means that
object could be omitted from object{.ATTR} to give {.ATTR} without
introducing any syntactic ambiguity (if no class was given, object would
be implied, just as it is for the base class list in class definitions).

However, it's far from clear saving six characters is worth making it
harder to visually distinguish mapping patterns from instance attribute
patterns, so allowing this has been deferred as a topic for possible
future consideration.

Avoiding special cases in sequence patterns

Sequence patterns in both this PEP and PEP 634 currently special case
str, bytes, and bytearray as specifically never matching a sequence
pattern.

This special casing could potentially be removed if we were to define a
new collections.abc.AtomicSequence abstract base class for types like
these, where they're conceptually a single item, but still implement the
sequence protocol to allow random access to their component parts.

Expression syntax to retrieve multiple attributes from an instance

The instance attribute pattern syntax has been designed such that it
could be used as the basis for a general purpose syntax for retrieving
multiple attributes from an object in a single expression:

    host, port = obj{.host, .port}

Similar to slice syntax only being allowed inside bracket subscrpts, the
.attr syntax for naming attributes would only be allowed inside brace
subscripts.

This idea isn't required for pattern matching to be useful, so it isn't
part of this PEP. However, it's mentioned as a possible path towards
making pattern matching feel more integrated into the rest of the
language, rather than existing forever in its own completely separated
world.

Expression syntax to retrieve multiple attributes from an instance

If the brace subscript syntax were to be accepted for instance attribute
pattern matching, and then subsequently extended to offer general
purpose extraction of multiple attributes, then it could be extended
even further to allow for retrieval of multiple items from containers
based on the syntax used for mapping pattern matching:

    host, port = obj{"host", "port"}
    first, last = obj{0, -1}

Again, this idea isn't required for pattern matching to be useful, so it
isn't part of this PEP. As with retrieving multiple attributes, however,
it is included as an example of the proposed pattern matching syntax
inspiring ideas for making object deconstruction easier in general.

Rejected Ideas

Restricting permitted expressions in value constraints and mapping pattern keys

While it's entirely technically possible to restrict the kinds of
expressions permitted in value constraints and mapping pattern keys to
just attribute lookups and constant literals (as PEP 634 does), there
isn't any clear runtime value in doing so, so this PEP proposes allowing
any kind of primary expression (primary expressions are an existing node
type in the grammar that includes things like literals, names, attribute
lookups, function calls, container subscripts, parenthesised groups,
etc), as well as high precedence unary operations (+, -, ~) on primary
expressions.

While PEP 635 does emphasise several times that literal patterns and
value patterns are not full expressions, it doesn't ever articulate a
concrete benefit that is obtained from that restriction (just a
theoretical appeal to it being useful to separate static checks from
dynamic checks, which a code style tool could still enforce, even if the
compiler itself is more permissive).

The last time we imposed such a restriction was for decorator
expressions and the primary outcome of that was that users had to put up
with years of awkward syntactic workarounds (like nesting arbitrary
expressions inside function calls that just returned their argument) to
express the behaviour they wanted before the language definition was
finally updated to allow arbitrary expressions and let users make their
own decisions about readability.

The situation in PEP 634 that bears a resemblance to the situation with
decorator expressions is that arbitrary expressions are technically
supported in value patterns, they just require awkward workarounds where
either all the values to match need to be specified in a helper class
that is placed before the match statement:

    # Allowing arbitrary match targets with PEP 634's value pattern syntax
    class mt:
        value = func()
    match expr:
        case (_, mt.value):
            ... # Handle the case where 'expr[1] == func()'

Or else they need to be written as a combination of a capture pattern
and a guard expression:

    # Allowing arbitrary match targets with PEP 634's guard expressions
    match expr:
        case (_, _matched) if _matched == func():
            ... # Handle the case where 'expr[1] == func()'

This PEP proposes skipping requiring any such workarounds, and instead
supporting arbitrary value constraints from the start:

    match expr:
        case (__, == func()):
            ... # Handle the case where 'expr == func()'

Whether actually writing that kind of code is a good idea would be a
topic for style guides and code linters, not the language compiler.

In particular, if static analysers can't follow certain kinds of dynamic
checks, then they can limit the permitted expressions at analysis time,
rather than the compiler restricting them at compile time.

There are also some kinds of expressions that are almost certain to give
nonsensical results (e.g. yield, yield from, await) due to the pattern
caching rule, where the number of times the constraint expression
actually gets evaluated will be implementation dependent. Even here, the
PEP takes the view of letting users write nonsense if they really want
to.

Aside from the recently updated decorator expressions, another situation
where Python's formal syntax offers full freedom of expression that is
almost never used in practice is in except clauses: the exceptions to
match against almost always take the form of a simple name, a dotted
name, or a tuple of those, but the language grammar permits arbitrary
expressions at that point. This is a good indication that Python's user
base can be trusted to take responsibility for finding readable ways to
use permissive language features, by avoiding writing hard to read
constructs even when they're permitted by the compiler.

This permissiveness comes with a real concrete benefit on the
implementation side: dozens of lines of match statement specific code in
the compiler is replaced by simple calls to the existing code for
compiling expressions (including in the AST validation pass, the AST
optimization pass, the symbol table analysis pass, and the code
generation pass). This implementation benefit would accrue not just to
CPython, but to every other Python implementation looking to add match
statement support.

Requiring the use of constraint prefix markers for mapping pattern keys

The initial (unpublished) draft of this proposal suggested requiring
mapping pattern keys be value constraints, just as PEP 634 requires that
they be valid literal or value patterns:

    import constants

    match config:
        case {== "route": route}:
            process_route(route)
        case {== constants.DEFAULT_PORT: sub_config, **rest}:
            process_config(sub_config, rest)

However, the extra characters were syntactically noisy and unlike its
use in value constraints (where it distinguishes them from non-pattern
expressions), the prefix doesn't provide any additional information here
that isn't already conveyed by the expression's position as a key within
a mapping pattern.

Accordingly, the proposal was simplified to omit the marker prefix from
mapping pattern keys.

This omission also aligns with the fact that containers may incorporate
both identity and equality checks into their lookup process - they don't
purely rely on equality checks, as would be incorrectly implied by the
use of the equality constraint prefix.

Allowing the key/value separator to be omitted for mapping value constraints

Instance attribute patterns allow the : separator to be omitted when
writing attribute value constraints like case object{.attr == expr}.

Offering a similar shorthand for mapping value constraints was
considered, but permitting it allows thoroughly baffling constructs like
case {0 == 0}: where the compiler knows this is the key 0 with the value
constraint == 0, but a human reader sees the tautological comparison
operation 0 == 0. With the key/value separator included, the intent is
more obvious to a human reader as well: case {0: == 0}:

Reference Implementation

A draft reference implementation for this PEP[8] has been derived from
Brandt Bucher's reference implementation for PEP 634[9].

Relative to the text of this PEP, the draft reference implementation has
not yet complemented the special casing of several builtin and standard
library types in MATCH_CLASS with the more general check for
__match_args__ being set to None. Class defined patterns also currently
still accept classes that don't define __match_args__.

All other modified patterns have been updated to follow this PEP rather
than PEP 634.

Unparsing for match patterns has not yet been migrated to the updated v3
AST.

The AST validator for match patterns has not yet been implemented.

The AST validator in general has not yet been reviewed to ensure that it
is checking that only expression nodes are being passed in where
expression nodes are expected.

The examples in this PEP have not yet been converted to test cases, so
could plausibly contain typos and other errors.

Several of the old PEP 634 tests are still to be converted to new
SyntaxError tests.

The documentation has not yet been updated.

Acknowledgments

The PEP 622 and PEP 634/PEP 635/PEP 636 authors, as the proposal in this
PEP is merely an attempt to improve the readability of an already
well-constructed idea by proposing that starting with a more explicit
syntax and potentially introducing syntactic shortcuts for particularly
common operations later is a better option than attempting to only
define the shortcut version. For areas of the specification where the
two PEPs are the same (or at least very similar), the text describing
the intended behaviour in this PEP is often derived directly from the
PEP 634 text.

Steven D'Aprano, who made a compelling case that the key goals of this
PEP could be achieved by using existing comparison tokens to tell the
ability to override the compiler when our guesses as to "what most users
will want most of the time" are inevitably incorrect for at least some
users some of the time, and retaining some of PEP 634's syntactic sugar
(with a slightly different semantic definition) to obtain the same level
of brevity as PEP 634 in most situations. (Paul Sokolosvsky also
independently suggested using == instead of ? as a more easily
understood prefix for equality constraints).

Thomas Wouters, whose publication of PEP 640 and public review of the
structured pattern matching proposals persuaded the author of this PEP
to continue advocating for a wildcard pattern syntax that a future PEP
could plausibly turn into a hard keyword that always skips binding a
reference in any location a simple name is expected, rather than
continuing indefinitely as the match pattern specific soft keyword that
is proposed here.

Joao Bueno and Jim Jewett for nudging the PEP author to take a closer
look at the proposed syntax for subelement capturing within class
patterns and mapping patterns (particularly the problems with "capturing
to the right"). This review is what prompted the significant changes
between v2 and v3 of the proposal.

References

Appendix A -- Full Grammar

Here is the full modified grammar for match_stmt, replacing Appendix A
in PEP 634.

Notation used beyond standard EBNF is as per PEP 534:

-   'KWD' denotes a hard keyword
-   "KWD" denotes a soft keyword
-   SEP.RULE+ is shorthand for RULE (SEP RULE)*
-   !RULE is a negative lookahead assertion

    match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
    subject_expr:
        | star_named_expression ',' [star_named_expressions]
        | named_expression
    case_block: "case" (guarded_pattern | open_pattern) ':' block

    guarded_pattern: closed_pattern 'if' named_expression
    open_pattern: # Pattern may use multiple tokens with no closing delimiter
        | as_pattern
        | or_pattern

    as_pattern: [closed_pattern] pattern_as_clause
    as_pattern_with_inferred_wildcard: pattern_as_clause
    pattern_as_clause: 'as' pattern_capture_target
    pattern_capture_target: !"__" NAME !('.' | '(' | '=')

    or_pattern: '|'.simple_pattern+

    simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
        | closed_pattern
        | value_constraint

    value_constraint:
        | eq_constraint
        | id_constraint

    eq_constraint: '==' closed_expr
    id_constraint: 'is' closed_expr

    closed_expr: # Require a single token or a closing delimiter in expression
        | primary
        | closed_factor

    closed_factor: # "factor" is the main grammar node for these unary ops
        | '+' primary
        | '-' primary
        | '~' primary

    closed_pattern: # Require a single token or a closing delimiter in pattern
        | wildcard_pattern
        | group_pattern
        | structural_constraint

    wildcard_pattern: "__"

    group_pattern: '(' open_pattern ')'

    structural_constraint:
        | sequence_constraint
        | mapping_constraint
        | attrs_constraint
        | class_constraint

    sequence_constraint: '[' [sequence_constraint_elements] ']'
    sequence_constraint_elements: ','.sequence_constraint_element+ ','?
    sequence_constraint_element:
        | star_pattern
        | simple_pattern
        | as_pattern_with_inferred_wildcard
    star_pattern: '*' (pattern_as_clause | wildcard_pattern)

    mapping_constraint: '{' [mapping_constraint_elements] '}'
    mapping_constraint_elements: ','.key_value_constraint+ ','?
    key_value_constraint:
        | closed_expr pattern_as_clause
        | closed_expr ':' simple_pattern
        | double_star_capture
    double_star_capture: '**' pattern_as_clause

    attrs_constraint:
        | name_or_attr '{' [attrs_constraint_elements] '}'
    name_or_attr: attr | NAME
    attr: name_or_attr '.' NAME
    attrs_constraint_elements: ','.attr_value_constraint+ ','?
    attr_value_constraint:
        | '.' NAME pattern_as_clause
        | '.' NAME value_constraint
        | '.' NAME ':' simple_pattern
        | '.' NAME

    class_constraint:
        | name_or_attr '(' ')'
        | name_or_attr '(' positional_patterns ','? ')'
        | name_or_attr '(' class_constraint_attrs ')'
        | name_or_attr '(' positional_patterns ',' class_constraint_attrs] ')'
    positional_patterns: ','.positional_pattern+
    positional_pattern:
        | simple_pattern
        | as_pattern_with_inferred_wildcard
    class_constraint_attrs:
        | '**' '{' [attrs_constraint_elements] '}'

Appendix B: Summary of Abstract Syntax Tree changes

The following new nodes are added to the AST by this PEP:

    stmt = ...
          | ...
          | Match(expr subject, match_case* cases)
          | ...
          ...

    match_case = (pattern pattern, expr? guard, stmt* body)

    pattern = MatchAlways
         | MatchValue(matchop op, expr value)
         | MatchSequence(pattern* patterns)
         | MatchMapping(expr* keys, pattern* patterns)
         | MatchAttrs(expr cls, identifier* attrs, pattern* patterns)
         | MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)

         | MatchRestOfSequence(identifier? target)
         -- A NULL entry in the MatchMapping key list handles capturing extra mapping keys

         | MatchAs(pattern? pattern, identifier target)
         | MatchOr(pattern* patterns)

          attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset)

    matchop = EqCheck | IdCheck

Appendix C: Summary of changes relative to PEP 634

The overall match/case statement syntax and the guard expression syntax
remain the same as they are in PEP 634.

Relative to PEP 634 this PEP makes the following key changes:

-   a new pattern type is defined in the AST, rather than reusing the
    expr type for patterns
-   the new MatchAs and MatchOr AST nodes are moved from the expr type
    to the pattern type
-   the wildcard pattern changes from _ (single underscore) to __
    (double underscore), and gains a dedicated MatchAlways node in the
    AST
-   due to ambiguity of intent, value patterns and literal patterns are
    removed
-   a new expression category is introduced: "closed expressions"
-   closed expressions are either primary expressions, or a closed
    expression preceded by one of the high precedence unary operators
    (+, -, ~)
-   a new pattern type is introduced: "value constraint patterns"
-   value constraints have a dedicated MatchValue AST node rather than
    allowing a combination of Constant (literals), UnaryOp (negative
    numbers), BinOp (complex numbers), and Attribute (attribute lookups)
-   value constraint patterns are either equality constraints or
    identity constraints
-   equality constraints use == as a prefix marker on an otherwise
    arbitrary closed expression: == EXPR
-   identity constraints use is as a prefix marker on an otherwise
    arbitrary closed expression: is EXPR
-   due to ambiguity of intent, capture patterns are removed. All
    capture operations use the as keyword (even in sequence matching)
    and are represented in the AST as either MatchAs or
    MatchRestOfSequence nodes.
-   to reduce verbosity in AS patterns, as NAME is permitted, with the
    same meaning as __ as NAME
-   sequence patterns change to require the use of square brackets,
    rather than offering the same syntactic flexibility as assignment
    targets (assignment statements allow iterable unpacking to be
    indicated by any use of a tuple separated target, with or without
    surrounding parentheses or square brackets)
-   sequence patterns gain a dedicated MatchSequence AST node rather
    than reusing List
-   mapping patterns change to allow arbitrary closed expressions as
    keys
-   mapping patterns gain a dedicated MatchMapping AST node rather than
    reusing Dict
-   to reduce verbosity in mapping patterns, KEY : __ as NAME may be
    shortened to KEY as NAME
-   class patterns no longer use individual keyword argument syntax for
    attribute matching. Instead they use double-star syntax, along with
    a variant on mapping pattern syntax with a dot prefix on the
    attribute names
-   class patterns gain a dedicated MatchClass AST node rather than
    reusing Call
-   to reduce verbosity, class attribute matching allows : to be omitted
    when the pattern to be matched starts with ==, is, or as
-   class patterns treat any class that sets __match_args__ to None as
    accepting a single positional pattern that is matched against the
    entire object (avoiding the special casing required in PEP 634)
-   class patterns raise TypeError when used with an object that does
    not define __match_args__
-   dedicated syntax for ducktyping is added, such that case cls{...}:
    is roughly equivalent to case cls(**{...}):, but skips the check for
    the existence of __match_args__. This pattern also has a dedicated
    AST node, MatchAttrs

Note that postponing literal patterns also makes it possible to postpone
the question of whether we need an "INUMBER" token in the tokeniser for
imaginary literals. Without it, the parser can't distinguish complex
literals from other binary addition and subtraction operations on
constants, so proposals like PEP 634 have to do work in later
compilation steps to check for correct usage.

Appendix D: History of changes to this proposal

The first published iteration of this proposal mostly followed PEP 634,
but suggested using ?EXPR for equality constraints and ?is EXPR for
identity constraints rather than PEP 634's value patterns and literal
patterns.

The second published iteration mostly adopted a counter-proposal from
Steven D'Aprano that kept the PEP 634 style inferred constraints in many
situations, but also allowed the use of == EXPR for explicit equality
constraints, and is EXPR for explicit identity constraints.

The third published (and current) iteration dropped inferred patterns
entirely, in an attempt to resolve the concerns with the fact that the
patterns case {key: NAME}: and case cls(attr=NAME): would both bind NAME
despite it appearing to the right of another subexpression without using
the as keyword. The revised proposal also eliminates the possibility of
writing case TARGET1 as TARGET2:, which would bind to both of the given
names. Of those changes, the most concerning was
case cls(attr=TARGET_NAME):, since it involved the use of = with the
binding target on the right, the exact opposite of what happens in
assignment statements, function calls, and function signature
declarations.

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

[1] Pre-publication draft of "Precise Semantics for Pattern Matching"
https://github.com/markshannon/pattern-matching/blob/master/precise_semantics.rst

[2] Post explaining the syntactic novelties in PEP 622
https://mail.python.org/archives/list/python-dev@python.org/message/2VRPDW4EE243QT3QNNCO7XFZYZGIY6N3/>

[3] Declined pull request proposing to list this as a Rejected Idea in
PEP 622 https://github.com/python/peps/pull/1564

[4] Kohn et al., Dynamic Pattern Matching with Python
https://gvanrossum.github.io/docs/PyPatternMatching.pdf

[5] Steven D'Aprano's cogent criticism of the first published iteration
of this PEP
https://mail.python.org/archives/list/python-dev@python.org/message/BTHFWG6MWLHALOD6CHTUFPHAR65YN6BP/

[6] Thomas Wouter's initial review of the structured pattern matching
proposals
https://mail.python.org/archives/list/python-dev@python.org/thread/4SBR3J5IQUYE752KR7C6432HNBSYKC5X/

[7] Stack Overflow answer regarding the use cases for _ as an identifier
https://stackoverflow.com/questions/5893163/what-is-the-purpose-of-the-single-underscore-variable-in-python/5893946#5893946

[8] In-progress reference implementation for this PEP
https://github.com/ncoghlan/cpython/tree/pep-642-constraint-patterns

[9] PEP 634 reference implementation
https://github.com/python/cpython/pull/22917