PEP: 269 Title: Pgen Module for Python Author: Jonathan Riehl
<jriehl@spaceship.com> Status: Deferred Type: Standards Track
Content-Type: text/x-rst Created: 24-Aug-2001 Python-Version: 2.2
Post-History:

Abstract

Much like the parser module exposes the Python parser, this PEP proposes
that the parser generator used to create the Python parser, pgen, be
exposed as a module in Python.

Rationale

Through the course of Pythonic history, there have been numerous
discussions about the creation of a Python compiler[1]. These have
resulted in several implementations of Python parsers, most notably the
parser module currently provided in the Python standard library[2] and
Jeremy Hylton's compiler module[3]. However, while multiple language
changes have been proposed [4][5], experimentation with the Python
syntax has lacked the benefit of a Python binding to the actual parser
generator used to build Python.

By providing a Python wrapper analogous to Fred Drake Jr.'s parser
wrapper, but targeted at the pgen library, the following assertions are
made:

1.  Reference implementations of syntax changes will be easier to
    develop. Currently, a reference implementation of a syntax change
    would require the developer to use the pgen tool from the command
    line. The resulting parser data structure would then either have to
    be reworked to interface with a custom CPython implementation, or
    wrapped as a C extension module.
2.  Reference implementations of syntax changes will be easier to
    distribute. Since the parser generator will be available in Python,
    it should follow that the resulting parser will accessible from
    Python. Therefore, reference implementations should be available as
    pure Python code, versus using custom versions of the existing
    CPython distribution, or as compilable extension modules.
3.  Reference implementations of syntax changes will be easier to
    discuss with a larger audience. This somewhat falls out of the
    second assertion, since the community of Python users is most likely
    larger than the community of CPython developers.
4.  Development of small languages in Python will be further enhanced,
    since the additional module will be a fully functional LL(1) parser
    generator.

Specification

The proposed module will be called pgen. The pgen module will contain
the following functions:

parseGrammarFile (fileName) -> AST

The parseGrammarFile() function will read the file pointed to by
fileName and create an AST object. The AST nodes will contain the
nonterminal, numeric values of the parser generator meta-grammar. The
output AST will be an instance of the AST extension class as provided by
the parser module. Syntax errors in the input file will cause the
SyntaxError exception to be raised.

parseGrammarString (text) -> AST

The parseGrammarString() function will follow the semantics of the
parseGrammarFile(), but accept the grammar text as a string for input,
as opposed to the file name.

buildParser (grammarAst) -> DFA

The buildParser() function will accept an AST object for input and
return a DFA (deterministic finite automaton) data structure. The DFA
data structure will be a C extension class, much like the AST structure
is provided in the parser module. If the input AST does not conform to
the nonterminal codes defined for the pgen meta-grammar, buildParser()
will throw a ValueError exception.

parseFile (fileName, dfa, start) -> AST

The parseFile() function will essentially be a wrapper for the
PyParser_ParseFile() C API function. The wrapper code will accept the
DFA C extension class, and the file name. An AST instance that conforms
to the lexical values in the token module and the nonterminal values
contained in the DFA will be output.

parseString (text, dfa, start) -> AST

The parseString() function will operate in a similar fashion to the
parseFile() function, but accept the parse text as an argument. Much
like parseFile() will wrap the PyParser_ParseFile() C API function,
parseString() will wrap the PyParser_ParseString() function.

symbolToStringMap (dfa) -> dict

The symbolToStringMap() function will accept a DFA instance and return a
dictionary object that maps from the DFA's numeric values for its
nonterminals to the string names of the nonterminals as found in the
original grammar specification for the DFA.

stringToSymbolMap (dfa) -> dict

The stringToSymbolMap() function output a dictionary mapping the
nonterminal names of the input DFA to their corresponding numeric
values.

Extra credit will be awarded if the map generation functions and parsing
functions are also methods of the DFA extension class.

Implementation Plan

A cunning plan has been devised to accomplish this enhancement:

1.  Rename the pgen functions to conform to the CPython naming
    standards. This action may involve adding some header files to the
    Include subdirectory.
2.  Move the pgen C modules in the Makefile.pre.in from unique pgen
    elements to the Python C library.
3.  Make any needed changes to the parser module so the AST extension
    class understands that there are AST types it may not understand.
    Cursory examination of the AST extension class shows that it keeps
    track of whether the tree is a suite or an expression.
4.  Code an additional C module in the Modules directory. The C
    extension module will implement the DFA extension class and the
    functions outlined in the previous section.
5.  Add the new module to the build process. Black magic, indeed.

Limitations

Under this proposal, would be designers of Python 3000 will still be
constrained to Python's lexical conventions. The addition, subtraction
or modification of the Python lexer is outside the scope of this PEP.

Reference Implementation

No reference implementation is currently provided. A patch was provided
at some point in
http://sourceforge.net/tracker/index.php?func=detail&aid=599331&group_id=5470&atid=305470
but that patch is no longer maintained.

References

Copyright

This document has been placed in the public domain.

[1] The (defunct) Python Compiler-SIG
http://www.python.org/sigs/compiler-sig/

[2] Parser Module Documentation
http://docs.python.org/library/parser.html

[3] Hylton, Jeremy. http://docs.python.org/library/compiler.html

[4] Pelletier, Michel. "Python Interface Syntax", PEP 245

[5] The Python Types-SIG http://www.python.org/sigs/types-sig/