PEP: 617 Title: New PEG parser for CPython Author: Guido van Rossum
<guido@python.org>, Pablo Galindo <pablogsal@python.org>, Lysandros
Nikolaou <lisandrosnik@gmail.com> Discussions-To: python-dev@python.org
Status: Final Type: Standards Track Created: 24-Mar-2020 Python-Version:
3.9 Post-History: 02-Apr-2020

python:full-grammar-specification

Overview

This PEP proposes replacing the current LL(1)-based parser of CPython
with a new PEG-based parser. This new parser would allow the elimination
of multiple "hacks" that exist in the current grammar to circumvent the
LL(1)-limitation. It would substantially reduce the maintenance costs in
some areas related to the compiling pipeline such as the grammar, the
parser and the AST generation. The new PEG parser will also lift the
LL(1) restriction on the current Python grammar.

Background on LL(1) parsers

The current Python grammar is an LL(1)-based grammar. A grammar can be
said to be LL(1) if it can be parsed by an LL(1) parser, which in turn
is defined as a top-down parser that parses the input from left to
right, performing leftmost derivation of the sentence, with just one
token of lookahead. The traditional approach to constructing or
generating an LL(1) parser is to produce a parse table which encodes the
possible transitions between all possible states of the parser. These
tables are normally constructed from the first sets and the follow sets
of the grammar:

-   Given a rule, the first set is the collection of all terminals that
    can occur first in a full derivation of that rule. Intuitively, this
    helps the parser decide among the alternatives in a rule. For
    instance, given the rule:

        rule: A | B

    if only A can start with the terminal a and only B can start with
    the terminal b and the parser sees the token b when parsing this
    rule, it knows that it needs to follow the non-terminal B.

-   An extension to this simple idea is needed when a rule may expand to
    the empty string. Given a rule, the follow set is the collection of
    terminals that can appear immediately to the right of that rule in a
    partial derivation. Intuitively, this solves the problem of the
    empty alternative. For instance, given this rule:

        rule: A 'b'

    if the parser has the token b and the non-terminal A can only start
    with the token a, then the parser can tell that this is an invalid
    program. But if A could expand to the empty string (called an
    ε-production), then the parser would recognise a valid empty A,
    since the next token b is in the follow set of A.

The current Python grammar does not contain ε-productions, so the follow
sets are not needed when creating the parse tables. Currently, in
CPython, a parser generator program reads the grammar and produces a
parsing table representing a set of deterministic finite automata (DFA)
that can be included in a C program, the parser. The parser is a
pushdown automaton that uses this data to produce a Concrete Syntax Tree
(CST) sometimes known directly as a "parse tree". In this process, the
first sets are used indirectly when generating the DFAs.

LL(1) parsers and grammars are usually efficient and simple to implement
and generate. However, it is not possible, under the LL(1) restriction,
to express certain common constructs in a way natural to the language
designer and the reader. This includes some in the Python language.

As LL(1) parsers can only look one token ahead to distinguish
possibilities, some rules in the grammar may be ambiguous. For instance
the rule:

    rule: A | B

is ambiguous if the first sets of both A and B have some elements in
common. When the parser sees a token in the input program that both A
and B can start with, it is impossible for it to deduce which option to
expand, as no further token of the program can be examined to
disambiguate. The rule may be transformed to equivalent LL(1) rules, but
then it may be harder for a human reader to grasp its meaning. Examples
later in this document show that the current LL(1)-based grammar suffers
a lot from this scenario.

Another broad class of rules precluded by LL(1) is left-recursive rules.
A rule is left-recursive if it can derive to a sentential form with
itself as the leftmost symbol. For instance this rule:

    rule: rule 'a'

is left-recursive because the rule can be expanded to an expression that
starts with itself. As will be described later, left-recursion is the
natural way to express certain desired language properties directly in
the grammar.

Background on PEG parsers

A PEG (Parsing Expression Grammar) grammar differs from a context-free
grammar (like the current one) in the fact that the way it is written
more closely reflects how the parser will operate when parsing it. The
fundamental technical difference is that the choice operator is ordered.
This means that when writing:

    rule: A | B | C

a context-free-grammar parser (like an LL(1) parser) will generate
constructions that given an input string will deduce which alternative
(A, B or C) must be expanded, while a PEG parser will check if the first
alternative succeeds and only if it fails, will it continue with the
second or the third one in the order in which they are written. This
makes the choice operator not commutative.

Unlike LL(1) parsers, PEG-based parsers cannot be ambiguous: if a string
parses, it has exactly one valid parse tree. This means that a PEG-based
parser cannot suffer from the ambiguity problems described in the
previous section.

PEG parsers are usually constructed as a recursive descent parser in
which every rule in the grammar corresponds to a function in the program
implementing the parser and the parsing expression (the "expansion" or
"definition" of the rule) represents the "code" in said function. Each
parsing function conceptually takes an input string as its argument, and
yields one of the following results:

-   A "success" result. This result indicates that the expression can be
    parsed by that rule and the function may optionally move forward or
    consume one or more characters of the input string supplied to it.
-   A "failure" result, in which case no input is consumed.

Notice that "failure" results do not imply that the program is incorrect
or a parsing failure because as the choice operator is ordered, a
"failure" result merely indicates "try the following option". A direct
implementation of a PEG parser as a recursive descent parser will
present exponential time performance in the worst case as compared with
LL(1) parsers, because PEG parsers have infinite lookahead (this means
that they can consider an arbitrary number of tokens before deciding for
a rule). Usually, PEG parsers avoid this exponential time complexity
with a technique called "packrat parsing"[1] which not only loads the
entire program in memory before parsing it but also allows the parser to
backtrack arbitrarily. This is made efficient by memoizing the rules
already matched for each position. The cost of the memoization cache is
that the parser will naturally use more memory than a simple LL(1)
parser, which normally are table-based. We will explain later in this
document why we consider this cost acceptable.

Rationale

In this section, we describe a list of problems that are present in the
current parser machinery in CPython that motivates the need for a new
parser.

Some rules are not actually LL(1)

Although the Python grammar is technically an LL(1) grammar (because it
is parsed by an LL(1) parser) several rules are not LL(1) and several
workarounds are implemented in the grammar and in other parts of CPython
to deal with this. For example, consider the rule for assignment
expressions:

    namedexpr_test: [NAME ':='] test

This simple rule is not compatible with the Python grammar as NAME is
among the elements of the first set of the rule test. To work around
this limitation the actual rule that appears in the current grammar is:

    namedexpr_test: test [':=' test]

Which is a much broader rule than the previous one allowing constructs
like [x for x in y] := [1,2,3]. The way the rule is limited to its
desired form is by disallowing these unwanted constructions when
transforming the parse tree to the abstract syntax tree. This is not
only inelegant but a considerable maintenance burden as it forces the
AST creation routines and the compiler into a situation in which they
need to know how to separate valid programs from invalid programs, which
should be a responsibility solely of the parser. This also leads to the
actual grammar file not reflecting correctly what the actual grammar is
(that is, the collection of all valid Python programs).

Similar workarounds appear in multiple other rules of the current
grammar. Sometimes this problem is unsolvable. For instance, bpo-12782:
Multiple context expressions do not support parentheses for continuation
across lines shows how making an LL(1) rule that supports writing:

    with (
        open("a_really_long_foo") as foo,
        open("a_really_long_baz") as baz,
        open("a_really_long_bar") as bar
    ):
      ...

is not possible since the first sets of the grammar items that can
appear as context managers include the open parenthesis, making the rule
ambiguous. This rule is not only consistent with other parts of the
language (like the rule for multiple imports), but is also very useful
to auto-formatting tools, as parenthesized groups are normally used to
group elements to be formatted together (in the same way the tools
operate on the contents of lists, sets...).

Complicated AST parsing

Another problem of the current parser is that there is a huge coupling
between the AST generation routines and the particular shape of the
produced parse trees. This makes the code for generating the AST
especially complicated as many actions and choices are implicit. For
instance, the AST generation code knows what alternatives of a certain
rule are produced based on the number of child nodes present in a given
parse node. This makes the code difficult to follow as this property is
not directly related to the grammar file and is influenced by
implementation details. As a result of this, a considerable amount of
the AST generation code needs to deal with inspecting and reasoning
about the particular shape of the parse trees that it receives.

Lack of left recursion

As described previously, a limitation of LL(1) grammars is that they
cannot allow left-recursion. This makes writing some rules very
unnatural and far from how programmers normally think about the program.
For instance this construct (a simpler variation of several rules
present in the current grammar):

    expr: expr '+' term | term

cannot be parsed by an LL(1) parser. The traditional remedy is to
rewrite the grammar to circumvent the problem:

    expr: term ('+' term)*

The problem that appears with this form is that the parse tree is forced
to have a very unnatural shape. This is because with this rule, for the
input program a + b + c the parse tree will be flattened
(['a', '+', 'b', '+', 'c']) and must be post-processed to construct a
left-recursive parse tree ([['a', '+', 'b'], '+', 'c']). Being forced to
write the second rule not only leads to the parse tree not correctly
reflecting the desired associativity, but also imposes further pressure
on later compilation stages to detect and post-process these cases.

Intermediate parse tree

The last problem present in the current parser is the intermediate
creation of a parse tree or Concrete Syntax Tree that is later
transformed to an Abstract Syntax Tree. Although the construction of a
CST is very common in parser and compiler pipelines, in CPython this
intermediate CST is not used by anything else (it is only indirectly
exposed by the parser module and a surprisingly small part of the code
in the CST production is reused in the module). Which is worse: the
whole tree is kept in memory, keeping many branches that consist of
chains of nodes with a single child. This has been shown to consume a
considerable amount of memory (for instance in bpo-26415: Excessive peak
memory consumption by the Python parser).

Having to produce an intermediate result between the grammar and the AST
is not only undesirable but also makes the AST generation step much more
complicated, raising considerably the maintenance burden.

The new proposed PEG parser

The new proposed PEG parser contains the following pieces:

-   A parser generator that can read a grammar file and produce a PEG
    parser written in Python or C that can parse said grammar.
-   A PEG meta-grammar that automatically generates a Python parser that
    is used for the parser generator itself (this means that there are
    no manually-written parsers).
-   A generated parser (using the parser generator) that can directly
    produce C and Python AST objects.

Left recursion

PEG parsers normally do not support left recursion but we have
implemented a technique similar to the one described in Medeiros et
al.[2] but using the memoization cache instead of static variables. This
approach is closer to the one described in Warth et al.[3]. This allows
us to write not only simple left-recursive rules but also more
complicated rules that involve indirect left-recursion like:

    rule1: rule2 | 'a'
    rule2: rule3 | 'b'
    rule3: rule1 | 'c'

and "hidden left-recursion" like:

    rule: 'optional'? rule '@' some_other_rule

Syntax

The grammar consists of a sequence of rules of the form:

    rule_name: expression

Optionally, a type can be included right after the rule name, which
specifies the return type of the C or Python function corresponding to
the rule:

    rule_name[return_type]: expression

If the return type is omitted, then a void * is returned in C and an Any
in Python.

Grammar Expressions

# comment

Python-style comments.

e1 e2

Match e1, then match e2.

    rule_name: first_rule second_rule

e1 | e2

Match e1 or e2.

The first alternative can also appear on the line after the rule name
for formatting purposes. In that case, a | must be used before the first
alternative, like so:

    rule_name[return_type]:
        | first_alt
        | second_alt

( e )

Match e.

    rule_name: (e)

A slightly more complex and useful example includes using the grouping
operator together with the repeat operators:

    rule_name: (e1 e2)*

[ e ] or e?

Optionally match e.

    rule_name: [e]

A more useful example includes defining that a trailing comma is
optional:

    rule_name: e (',' e)* [',']

e*

Match zero or more occurrences of e.

    rule_name: (e1 e2)*

e+

Match one or more occurrences of e.

    rule_name: (e1 e2)+

s.e+

Match one or more occurrences of e, separated by s. The generated parse
tree does not include the separator. This is otherwise identical to
(e (s e)*).

    rule_name: ','.e+

&e

Succeed if e can be parsed, without consuming any input.

!e

Fail if e can be parsed, without consuming any input.

An example taken from the proposed Python grammar specifies that a
primary consists of an atom, which is not followed by a . or a ( or a [:

    primary: atom !'.' !'(' !'['

~

Commit to the current alternative, even if it fails to parse.

    rule_name: '(' ~ some_rule ')' | some_alt

In this example, if a left parenthesis is parsed, then the other
alternative won’t be considered, even if some_rule or ‘)’ fail to be
parsed.

Variables in the Grammar

A subexpression can be named by preceding it with an identifier and an =
sign. The name can then be used in the action (see below), like this:

    rule_name[return_type]: '(' a=some_other_rule ')' { a }

Grammar actions

To avoid the intermediate steps that obscure the relationship between
the grammar and the AST generation the proposed PEG parser allows
directly generating AST nodes for a rule via grammar actions. Grammar
actions are language-specific expressions that are evaluated when a
grammar rule is successfully parsed. These expressions can be written in
Python or C depending on the desired output of the parser generator.
This means that if one would want to generate a parser in Python and
another in C, two grammar files should be written, each one with a
different set of actions, keeping everything else apart from said
actions identical in both files. As an example of a grammar with Python
actions, the piece of the parser generator that parses grammar files is
bootstrapped from a meta-grammar file with Python actions that generate
the grammar tree as a result of the parsing.

In the specific case of the new proposed PEG grammar for Python, having
actions allows directly describing how the AST is composed in the
grammar itself, making it more clear and maintainable. This AST
generation process is supported by the use of some helper functions that
factor out common AST object manipulations and some other required
operations that are not directly related to the grammar.

To indicate these actions each alternative can be followed by the action
code inside curly-braces, which specifies the return value of the
alternative:

    rule_name[return_type]:
        | first_alt1 first_alt2 { first_alt1 }
        | second_alt1 second_alt2 { second_alt1 }

If the action is omitted and C code is being generated, then there are
two different possibilities:

1.  If there’s a single name in the alternative, this gets returned.
2.  If not, a dummy name object gets returned (this case should be
    avoided).

If the action is omitted and Python code is being generated, then a list
with all the parsed expressions gets returned (this is meant for
debugging).

The full meta-grammar for the grammars supported by the PEG generator
is:

    start[Grammar]: grammar ENDMARKER { grammar }

    grammar[Grammar]:
        | metas rules { Grammar(rules, metas) }
        | rules { Grammar(rules, []) }

    metas[MetaList]:
        | meta metas { [meta] + metas }
        | meta { [meta] }

    meta[MetaTuple]:
        | "@" NAME NEWLINE { (name.string, None) }
        | "@" a=NAME b=NAME NEWLINE { (a.string, b.string) }
        | "@" NAME STRING NEWLINE { (name.string, literal_eval(string.string)) }

    rules[RuleList]:
        | rule rules { [rule] + rules }
        | rule { [rule] }

    rule[Rule]:
        | rulename ":" alts NEWLINE INDENT more_alts DEDENT {
              Rule(rulename[0], rulename[1], Rhs(alts.alts + more_alts.alts)) }
        | rulename ":" NEWLINE INDENT more_alts DEDENT { Rule(rulename[0], rulename[1], more_alts) }
        | rulename ":" alts NEWLINE { Rule(rulename[0], rulename[1], alts) }

    rulename[RuleName]:
        | NAME '[' type=NAME '*' ']' {(name.string, type.string+"*")}
        | NAME '[' type=NAME ']' {(name.string, type.string)}
        | NAME {(name.string, None)}

    alts[Rhs]:
        | alt "|" alts { Rhs([alt] + alts.alts)}
        | alt { Rhs([alt]) }

    more_alts[Rhs]:
        | "|" alts NEWLINE more_alts { Rhs(alts.alts + more_alts.alts) }
        | "|" alts NEWLINE { Rhs(alts.alts) }

    alt[Alt]:
        | items '$' action { Alt(items + [NamedItem(None, NameLeaf('ENDMARKER'))], action=action) }
        | items '$' { Alt(items + [NamedItem(None, NameLeaf('ENDMARKER'))], action=None) }
        | items action { Alt(items, action=action) }
        | items { Alt(items, action=None) }

    items[NamedItemList]:
        | named_item items { [named_item] + items }
        | named_item { [named_item] }

    named_item[NamedItem]:
        | NAME '=' ~ item {NamedItem(name.string, item)}
        | item {NamedItem(None, item)}
        | it=lookahead {NamedItem(None, it)}

    lookahead[LookaheadOrCut]:
        | '&' ~ atom {PositiveLookahead(atom)}
        | '!' ~ atom {NegativeLookahead(atom)}
        | '~' {Cut()}

    item[Item]:
        | '[' ~ alts ']' {Opt(alts)}
        |  atom '?' {Opt(atom)}
        |  atom '*' {Repeat0(atom)}
        |  atom '+' {Repeat1(atom)}
        |  sep=atom '.' node=atom '+' {Gather(sep, node)}
        |  atom {atom}

    atom[Plain]:
        | '(' ~ alts ')' {Group(alts)}
        | NAME {NameLeaf(name.string) }
        | STRING {StringLeaf(string.string)}

    # Mini-grammar for the actions

    action[str]: "{" ~ target_atoms "}" { target_atoms }

    target_atoms[str]:
        | target_atom target_atoms { target_atom + " " + target_atoms }
        | target_atom { target_atom }

    target_atom[str]:
        | "{" ~ target_atoms "}" { "{" + target_atoms + "}" }
        | NAME { name.string }
        | NUMBER { number.string }
        | STRING { string.string }
        | "?" { "?" }
        | ":" { ":" }

As an illustrative example this simple grammar file allows directly
generating a full parser that can parse simple arithmetic expressions
and that returns a valid C-based Python AST:

    start[mod_ty]: a=expr_stmt* $ { Module(a, NULL, p->arena) }
    expr_stmt[stmt_ty]: a=expr NEWLINE { _Py_Expr(a, EXTRA) }
    expr[expr_ty]:
        | l=expr '+' r=term { _Py_BinOp(l, Add, r, EXTRA) }
        | l=expr '-' r=term { _Py_BinOp(l, Sub, r, EXTRA) }
        | t=term { t }

    term[expr_ty]:
        | l=term '*' r=factor { _Py_BinOp(l, Mult, r, EXTRA) }
        | l=term '/' r=factor { _Py_BinOp(l, Div, r, EXTRA) }
        | f=factor { f }

    factor[expr_ty]:
        | '(' e=expr ')' { e }
        | a=atom { a }

    atom[expr_ty]:
        | n=NAME { n }
        | n=NUMBER { n }
        | s=STRING { s }

Here EXTRA is a macro that expands to
start_lineno, start_col_offset, end_lineno, end_col_offset, p->arena,
those being variables automatically injected by the parser; p points to
an object that holds on to all state for the parser.

A similar grammar written to target Python AST objects:

    start: expr NEWLINE? ENDMARKER { ast.Expression(expr) }
    expr: 
        | expr '+' term { ast.BinOp(expr, ast.Add(), term) }
        | expr '-' term { ast.BinOp(expr, ast.Sub(), term) }
        | term { term }

    term:
        | l=term '*' r=factor { ast.BinOp(l, ast.Mult(), r) }
        | term '/' factor { ast.BinOp(term, ast.Div(), factor) }
        | factor { factor }

    factor:
        | '(' expr ')' { expr }
        | atom { atom }

    atom: 
        | NAME { ast.Name(id=name.string, ctx=ast.Load()) }
        | NUMBER { ast.Constant(value=ast.literal_eval(number.string)) }

Migration plan

This section describes the migration plan when porting to the new
PEG-based parser if this PEP is accepted. The migration will be executed
in a series of steps that allow initially to fallback to the previous
parser if needed:

1.  Starting with Python 3.9 alpha 6, include the new PEG-based parser
    machinery in CPython with a command-line flag and environment
    variable that allows switching between the new and the old parsers
    together with explicit APIs that allow invoking the new and the old
    parsers independently. At this step, all Python APIs like ast.parse
    and compile will use the parser set by the flags or the environment
    variable and the default parser will be the new PEG-based parser.
2.  Between Python 3.9 and Python 3.10, the old parser and related code
    (like the "parser" module) will be kept until a new Python release
    happens (Python 3.10). In the meanwhile and until the old parser is
    removed, no new Python Grammar addition will be added that requires
    the PEG parser. This means that the grammar will be kept LL(1) until
    the old parser is removed.
3.  In Python 3.10, remove the old parser, the command-line flag, the
    environment variable and the "parser" module and related code.

Performance and validation

We have done extensive timing and validation of the new parser, and this
gives us confidence that the new parser is of high enough quality to
replace the current parser.

Validation

To start with validation, we regularly compile the entire Python 3.8
stdlib and compare every aspect of the resulting AST with that produced
by the standard compiler. (In the process we found a few bugs in the
standard parser's treatment of line and column numbers, which we have
all fixed upstream via a series of issues and PRs.)

We have also occasionally compiled a much larger codebase (the approx.
3800 most popular packages on PyPI) and this has helped us find a (very)
few additional bugs in the new parser.

(One area we have not explored extensively is rejection of all wrong
programs. We have unit tests that check for a certain number of explicit
rejections, but more work could be done, e.g. by using a fuzzer that
inserts random subtle bugs into existing code. We're open to help in
this area.)

Performance

We have tuned the performance of the new parser to come within 10% of
the current parser both in speed and memory consumption. While the
PEG/packrat parsing algorithm inherently consumes more memory than the
current LL(1) parser, we have an advantage because we don't construct an
intermediate CST.

Below are some benchmarks. These are focused on compiling source code to
bytecode, because this is the most realistic situation. Returning an AST
to Python code is not as representative, because the process to convert
the internal AST (only accessible to C code) to an external AST (an
instance of ast.AST) takes more time than the parser itself.

All measurements reported here are done on a recent MacBook Pro, taking
the median of three runs. No particular care was taken to stop other
applications running on the same machine.

The first timings are for our canonical test file, which has 100,000
lines endlessly repeating the following three lines:

    1 + 2 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + ((((((11 * 12 * 13 * 14 * 15 + 16 * 17 + 18 * 19 * 20))))))
    2*3 + 4*5*6
    12 + (2 * 3 * 4 * 5 + 6 + 7 * 8)

-   Just parsing and throwing away the internal AST takes 1.16 seconds
    with a max RSS of 681 MiB.
-   Parsing and converting to ast.AST takes 6.34 seconds, max RSS 1029
    MiB.
-   Parsing and compiling to bytecode takes 1.28 seconds, max RSS 681
    MiB.
-   With the current parser, parsing and compiling takes 1.44 seconds,
    max RSS 836 MiB.

For this particular test file, the new parser is faster and uses less
memory than the current parser (compare the last two bullets).

We also did timings with a more realistic payload, the entire Python 3.8
stdlib. This payload consists of 1,641 files, 749,570 lines, 27,622,497
bytes. (Though 11 files can't be compiled by any Python 3 parser due to
encoding issues, sometimes intentional.)

-   Compiling and throwing away the internal AST took 2.141 seconds.
    That's 350,040 lines/sec, or 12,899,367 bytes/sec. The max RSS was
    74 MiB (the largest file in the stdlib is much smaller than our
    canonical test file).
-   Compiling to bytecode took 3.290 seconds. That's 227,861 lines/sec,
    or 8,396,942 bytes/sec. Max RSS 77 MiB.
-   Compiling to bytecode using the current parser took 3.367 seconds.
    That's 222,620 lines/sec, or 8,203,780 bytes/sec. Max RSS 70 MiB.

Comparing the last two bullets we find that the new parser is slightly
faster but uses slightly (about 10%) more memory. We believe this is
acceptable. (Also, there are probably some more tweaks we can make to
reduce memory usage.)

Rejected Alternatives

We did not seriously consider alternative ways to implement the new
parser, but here's a brief discussion of LALR(1).

Thirty years ago the first author decided to go his own way with
Python's parser rather than using LALR(1), which was the industry
standard at the time (e.g. Bison and Yacc). The reasons were primarily
emotional (gut feelings, intuition), based on past experience using Yacc
in other projects, where grammar development took more effort than
anticipated (in part due to shift-reduce conflicts). A specific
criticism of Bison and Yacc that still holds is that their meta-grammar
(the notation used to feed the grammar into the parser generator) does
not support EBNF conveniences like [optional_clause] or
(repeated_clause)*. Using a custom parser generator, a syntax tree
matching the structure of the grammar could be generated automatically,
and with EBNF that tree could match the "human-friendly" structure of
the grammar.

Other variants of LR were not considered, nor was LL (e.g. ANTLR). PEG
was selected because it was easy to understand given a basic
understanding of recursive-descent parsing.

References

[4] Guido's series on PEG parsing
https://medium.com/@gvanrossum_83706/peg-parsing-series-de5d41b2ed60

Copyright

This document has been placed in the public domain.

[1] Ford, Bryan https://pdos.csail.mit.edu/~baford/packrat/thesis/

[2] Medeiros et al. https://arxiv.org/pdf/1207.0443.pdf

[3] Warth et al. https://web.cs.ucla.edu/~todd/research/pepm08.pdf