PEP: 638 Title: Syntactic Macros Author: Mark Shannon <mark@hotpy.org>
Discussions-To:
https://mail.python.org/archives/list/python-dev@python.org/thread/U4C4XHNRC4SHS3TPZWCTY4SN4QU3TT6V/
Status: Draft Type: Standards Track Content-Type: text/x-rst Created:
24-Sep-2020 Post-History: 26-Sep-2020

Abstract

This PEP adds support for syntactic macros to Python. A macro is a
compile-time function that transforms a part of the program to allow
functionality that cannot be expressed cleanly in normal library code.

The term "syntactic" means that this sort of macro operates on the
program's syntax tree. This reduces the chance of mistranslation that
can happen with text-based substitution macros, and allows the
implementation of hygienic macros.

Syntactic macros allow libraries to modify the abstract syntax tree
during compilation, providing the ability to extend the language for
specific domains without adding to complexity to the language as a
whole.

Motivation

New language features can be controversial, disruptive and sometimes
divisive. Python is now sufficiently powerful and complex, that many
proposed additions are a net loss for the language due to the additional
complexity.

Although a language change may make certain patterns easy to express, it
will have a cost. Each new feature makes the language larger, harder to
learn and harder to understand. Python was once described as Python Fits
Your Brain, but that becomes less and less true as more and more
features are added.

Because of the high cost of adding a new feature, it is very difficult
or impossible to add a feature that would benefit only some users,
regardless of how many users, or how beneficial that feature would be to
them.

The use of Python in data science and machine learning has grown very
rapidly over the last few years. However, most of the core developers of
Python do not have a background in data science or machine learning.
This makes it extremely difficult for the core developers to determine
whether a language extension for machine learning is worthwhile.

By allowing language extensions to be modular and distributable, like
libraries, domain-specific extensions can be implemented without
negatively impacting users outside of that domain. A web developer is
likely to want a very different set of extensions from a data scientist.
We need to let the community develop their own extensions.

Without some form of user-defined language extensions, there will be a
constant battle between those wanting to keep the language compact and
fitting their brains, and those wanting a new feature that suits their
domain or programming style.

Improving the expressiveness of libraries for specific domains

Many domains see repeated patterns that are difficult or impossible to
express as a library. Macros can express those patterns in a more
concise and less error-prone way.

Trialing new language features

It is possible to demonstrate potential language extensions using
macros. For example, macros would have enabled the with statement and
yield from expression to have been trialed. Doing so might well have
lead to a higher quality implementation at first release, by allowing
more testing before those features were included in the language.

It is nearly impossible to make sure that a new feature is completely
reliable before it is released; bugs relating to the with and yield from
features were still being fixed many years after they were released.

Long term stability for the bytecode interpreter

Historically, new language features have been implemented by naive
compilation of the AST into new, complex bytecode instructions. Those
bytecodes have often had their own internal flow-control, performing
operations that could, and should, have been done in the compiler.

For example, until recently flow control within the try-finally and with
statements was managed by complicated bytecodes with context-dependent
semantics. The control flow within those statements is now implemented
in the compiler, making the interpreter simpler and faster.

By implementing new features as AST transformations, the existing
compiler can generate the bytecode for a feature without having to
modify the interpreter.

A stable interpreter is necessary if we are to improve the performance
and portability of the CPython VM.

Rationale

Python is both expressive and easy to learn; it is widely recognized as
the easiest-to-learn, widely used programming language. However, it is
not the most flexible. That title belongs to lisp.

Because lisp is homoiconic, meaning that lisp programs are lisp data
structures, lisp programs can be manipulated by lisp programs. Thus much
of the language can be defined in itself.

We would like that ability in Python, without the many parentheses that
characterize lisp. Fortunately, homoiconicity is not needed for a
language to be able to manipulate itself, all that is needed is the
ability to manipulate programs after parsing, but before translation to
an executable form.

Python already has the components needed. The syntax tree of Python is
available through the ast module. All that is needed is a marker to tell
the compiler that a macro is present, and the ability for the compiler
to callback into user code to manipulate the AST.

Specification

Syntax

Lexical analysis

Any sequence of identifier characters followed by an exclamation point
(exclamation mark, UK English) will be tokenized as a MACRO_NAME.

Statement form

    macro_stmt = MACRO_NAME testlist [ "import" NAME ] [ "as"  NAME ] [ ":" NEWLINE suite ]

Expression form

    macro_expr = MACRO_NAME "(" testlist ")"

Resolving ambiguity

The statement form of a macro takes precedence, so that the code
macro_name!(x) will be parsed as a macro statement, not as an expression
statement containing a macro expression.

Semantics

Compilation

Upon encountering a macro during translation to bytecode, the code
generator will look up the macro processor registered for the macro, and
pass the AST rooted at the macro to the processor function. The returned
AST will then be substituted for the original tree.

For macros with multiple names, several trees will be passed to the
macro processor, but only one will be returned and substituted, shorting
the enclosing block of statements.

This process can be repeated, to enable macros to return AST nodes
including other macros.

The compiler will not look up a macro processor until that macro is
reached, so that inner macros do not need to have processors registered.
For example, in a switch macro, the case and default macros wouldn't
need processors registered as they would be eliminated by the switch
processor.

To enable definition of macros to be imported, the macros import! and
from! are predefined. They support the following syntax:

    "import!" dotted_name "as" name

    "from!" dotted_name "import" name [ "as" name ]

The import! macro performs a compile-time import of dotted_name to find
the macro processor, then registers it under name for the scope
currently being compiled.

The from! macro performs a compile-time import of dotted_name.name to
find the macro processor, then registers it under name (using the name
following "as", if present) for the scope currently being compiled.

Note that, since import! and from! only define the macro for the scope
in which the import is present, all uses of a macro must be preceded by
an explicit import! or from! to improve clarity.

For example, to import the macro "compile" from "my.compiler":

    from! my.compiler import compile

Defining macro processors

A macro processor is defined by a four-tuple, consisting of
(func, kind, version, additional_names):

-   func must be a callable that takes len(additional_names)+1
    arguments, all of which are abstract syntax trees, and returns a
    single abstract syntax tree.
-   kind must be one of the following:
    -   macros.STMT_MACRO: A statement macro where the body of the macro
        is indented. This is the only form allowed to have additional
        names.
    -   macros.SIBLING_MACRO: A statement macro where the body of the
        macro is the next statement in the same block. The following
        statement is moved into the macro as its body.
    -   macros.EXPR_MACRO: An expression macro.
-   version is used to track versions of macros, so that generated
    bytecodes can be correctly cached. It must be an integer.
-   additional_names are the names of the additional parts of the macro,
    and must be a tuple of strings.

    # (func, _ast.STMT_MACRO, VERSION, ())
    stmt_macro!:
        multi_statement_body

    # (func, _ast.SIBLING_MACRO, VERSION, ())
    sibling_macro!
    single_statement_body

    # (func, _ast.EXPR_MACRO, VERSION, ())
    x = expr_macro!(...)

    # (func, _ast.STMT_MACRO, VERSION, ("subsequent_macro_part",))
    multi_part_macro!:
        multi_statement_body
    subsequent_macro_part!:
        multi_statement_body

The compiler will check that the syntax used matches the declared kind.

For convenience, the decorator macro_processor is provided in the macros
module to mark a function as a macro processor:

    def macro_processor(kind, version, *additional_names):
        def deco(func):
            return func, kind, version, additional_names
        return deco

Which can be used to help declare macro processors, for example:

    @macros.macro_processor(macros.STMT_MACRO, 1_08)
    def switch(astnode):
        ...

AST extensions

Two new AST nodes will be needed to express macros, macro_stmt and
macro_expr.

    class macro_stmt(_ast.stmt):
        _fields = "name", "args", "importname", "asname", "body"

    class macro_expr(_ast.expr):
        _fields = "name", "args"

In addition, macro processors will need a means to express control flow
or side-effecting code, that produces a value. A new AST node called
stmt_expr will be added, combining a statement and an expression. This
new ast node will be a subtype of expr, but include a statement to allow
side effects. It will be compiled to bytecode by compiling the
statement, then compiling the value.

    class stmt_expr(_ast.expr):
        _fields = "stmt", "value"

Hygiene and debugging

Macro processors will often need to create new variables. Those
variables need to named in such as way as to avoid contaminating the
original code and other macros. No rules for naming will be enforced,
but to ensure hygiene and help debugging, the following naming scheme is
recommended:

-   All generated variable names should start with a $
-   Purely artificial variable names should start $$mname where mname is
    the name of the macro.
-   Variables derived from real variables should start $vname where
    vname is the name of the variable.
-   All variable names should include the line number and the column
    offset, separated by an underscore.

Examples:

-   Purely generated name: $$macro_17_0
-   Name derived from a variable for an expression macro: $var_12_5

Examples

Compile-time-checked data structures

It is common to encode tables of data in Python as large dictionaries.
However, these can be hard to maintain and error prone. Macros allow
such data to be written in a more readable format. Then, at compile
time, the data can be verified and converted to an efficient format.

For example, suppose we have a two dictionary literals mapping codes to
names, and vice versa. This is error prone, as the dictionaries may have
duplicate keys, or one table may not be the inverse of the other. A
macro could generate the two mappings from a single table and, at the
same time, verify that no duplicates are present.

    color_to_code = {
        "red": 1,
        "blue": 2,
        "green": 3,
    }

    code_to_color = {
        1: "red",
        2: "blue",
        3: "yellow", # error
    }

would become: :

    bijection! color_to_code, code_to_color:
        "red" = 1
        "blue" = 2
        "green" = 3

Domain-specific extensions

Where I see macros having real value is in specific domains, not in
general-purpose language features.

For example, parsers. Here's part of a parser definition for Python,
using macros:

    choice! single_input:
        NEWLINE
        simple_stmt
        sequence!:
            compound_stmt
            NEWLINE

Compilers

Runtime compilers, such as numba have to reconstitute the Python source,
or attempt to analyze the bytecode. It would be simpler and more
reliable for them to get the AST directly:

    from! my.jit.library import jit

    jit!
    def func():
        ...

Matching symbolic expressions

When matching something representing syntax, such a Python ast node, or
a sympy expression, it is convenient to match against the actual syntax,
not the data structure representing it. For example, a calculator could
be implemented using a domain-specific macro for matching syntax:

    from! ast_matcher import match

    def calculate(node):
        if isinstance(node, Num):
            return node.n
        match! node:
            case! a + b:
                return calculate(a) + calculate(b)
            case! a - b:
                return calculate(a) - calculate(b)
            case! a * b:
                return calculate(a) * calculate(b)
            case! a / b:
                return calculate(a) / calculate(b)

Which could be converted to:

    def calculate(node):
        if isinstance(node, Num):
            return node.n
        $$match_4_0 = node
        if isinstance($$match_4_0, _ast.Add):
            a, b = $$match_4_0.left, $$match_4_0.right
            return calculate(a) + calculate(b)
        elif isinstance($$match_4_0, _ast.Sub):
            a, b = $$match_4_0.left, $$match_4_0.right
            return calculate(a) - calculate(b)
        elif isinstance($$match_4_0, _ast.Mul):
            a, b = $$match_4_0.left, $$match_4_0.right
            return calculate(a) * calculate(b)
        elif isinstance($$match_4_0, _ast.Div):
            a, b = $$match_4_0.left, $$match_4_0.right
            return calculate(a) / calculate(b)

Zero-cost markers and annotations

Annotations, either decorators or PEP 3107 function annotations, have a
runtime cost even if they serve only as markers for checkers or as
documentation.

    @do_nothing_marker
    def foo(...):
        ...

can be replaced with the zero-cost macro:

    do_nothing_marker!:
    def foo(...):
        ...

Prototyping language extensions

Although macros would be most valuable for domain-specific extensions,
it is possible to demonstrate possible language extensions using macros.

f-strings:

The f-string f"..." could be implemented as macro as f!("..."). Not
quite as nice to read, but would still be useful for experimenting with.

Try finally statement:

    try_!:
        body
    finally!:
        closing

Would be translated roughly as:

    try:
        body
    except:
        closing
    else:
        closing

Note:

    Care must be taken to handle returns, breaks and continues
    correctly. The above code is merely illustrative.

With statement:

    with! open(filename) as fd:
        return fd.read()

The above would require handling open specially. An alternative that
would be more explicit, would be:

    with! open!(filename) as fd:
        return fd.read()

Macro definition macros

Languages that have syntactic macros usually provide a macro for
defining macros. This PEP intentionally does not do that, as it is not
yet clear what a good design would be, and we want to allow the
community to define their own macros.

One possible form could be:

    macro_def! name:
        input:
            ... # input pattern, defining meta-variables
        output:
            ... # output pattern, using meta-variables

Backwards Compatibility

This PEP is fully backwards compatible.

Performance Implications

For code that doesn't use macros, there will be no effect on
performance.

For code that does use macros and has already been compiled to bytecode,
there will be some slight overhead to check that the version of macros
used to compile the code match the imported macro processors.

For code that has not been compiled, or compiled with different versions
of the macro processors, then there would be the usual overhead of
bytecode compilation, plus any additional overhead of macro processing.

It is worth noting that the speed of source to bytecode compilation is
largely irrelevant for Python performance.

Implementation

In order to allow transformation of the AST at compile time by Python
code, all AST nodes in the compiler will have to be Python objects.

To do that efficiently, will mean making all the nodes in the _ast
module immutable, so as not degrade performance by much. They will need
to be immutable to guarantee that the AST remains a tree to avoid having
to support cyclic GC. Making them immutable means they will not have a
__dict__ attribute, making them compact.

AST nodes in the ast module will remain mutable.

Currently, all AST nodes are allocated using an arena allocator.
Changing to use the standard allocator might slow compilation down a
little, but has advantages in terms of maintenance, as much code can be
deleted.

Reference Implementation

None as yet.

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.