PEP: 511 Title: API for code transformers Version: $Revision$
Last-Modified: $Date$ Author: Victor Stinner <vstinner@python.org>
Status: Rejected Type: Standards Track Content-Type: text/x-rst Created:
04-Jan-2016 Python-Version: 3.6

Rejection Notice

This PEP was rejected by its author.

This PEP was seen as blessing new Python-like programming languages
which are close but incompatible with the regular Python language. It
was decided to not promote syntaxes incompatible with Python.

This PEP was also seen as a nice tool to experiment new Python features,
but it is already possible to experiment them without the PEP, only with
importlib hooks. If a feature becomes useful, it should be directly part
of Python, instead of depending on an third party Python module.

Finally, this PEP was driven was the FAT Python optimization project
which was abandoned in 2016, since it was not possible to show any
significant speedup, but also because of the lack of time to implement
the most advanced and complex optimizations.

Abstract

Propose an API to register bytecode and AST transformers. Add also
-o OPTIM_TAG command line option to change .pyc filenames, -o noopt
disables the peephole optimizer. Raise an ImportError exception on
import if the .pyc file is missing and the code transformers required to
transform the code are missing. code transformers are not needed code
transformed ahead of time (loaded from .pyc files).

Rationale

Python does not provide a standard way to transform the code. Projects
transforming the code use various hooks. The MacroPy project uses an
import hook: it adds its own module finder in sys.meta_path to hook its
AST transformer. Another option is to monkey-patch the builtin compile()
function. There are even more options to hook a code transformer.

Python 3.4 added a compile_source() method to
importlib.abc.SourceLoader. But code transformation is wider than just
importing modules, see described use cases below.

Writing an optimizer or a preprocessor is out of the scope of this PEP.

Usage 1: AST optimizer

Transforming an Abstract Syntax Tree (AST) is a convenient way to
implement an optimizer. It's easier to work on the AST than working on
the bytecode, AST contains more information and is more high level.

Since the optimization can done ahead of time, complex but slow
optimizations can be implemented.

Example of optimizations which can be implemented with an AST optimizer:

-   Copy propagation: replace x=1; y=x with x=1; y=1
-   Constant folding: replace 1+1 with 2
-   Dead code elimination

Using guards (see PEP 510), it is possible to implement a much wider
choice of optimizations. Examples:

-   Simplify iterable: replace range(3) with (0, 1, 2) when used as
    iterable
-   Loop unrolling
-   Call pure builtins: replace len("abc") with 3
-   Copy used builtin symbols to constants
-   See also optimizations implemented in fatoptimizer, a static
    optimizer for Python 3.6.

The following issues can be implemented with an AST optimizer:

-   Issue #1346238: A constant folding optimization pass for the AST
-   Issue #2181: optimize out local variables at end of function
-   Issue #2499: Fold unary + and not on constants
-   Issue #4264: Patch: optimize code to use LIST_APPEND instead of
    calling list.append
-   Issue #7682: Optimisation of if with constant expression
-   Issue #10399: AST Optimization: inlining of function calls
-   Issue #11549: Build-out an AST optimizer, moving some functionality
    out of the peephole optimizer
-   Issue #17068: peephole optimization for constant strings
-   Issue #17430: missed peephole optimization

Usage 2: Preprocessor

A preprocessor can be easily implemented with an AST transformer. A
preprocessor has various and different usages.

Some examples:

-   Remove debug code like assertions and logs to make the code faster
    to run it for production.
-   Tail-call Optimization
-   Add profiling code
-   Lazy evaluation: see lazy_python (bytecode transformer) and lazy
    macro of MacroPy (AST transformer)
-   Change dictionary literals into collection.OrderedDict instances
-   Declare constants: see @asconstants of codetransformer
-   Domain Specific Language (DSL) like SQL queries. The Python language
    itself doesn't need to be modified. Previous attempts to implement
    DSL for SQL like PEP 335 - Overloadable Boolean
    Operators <335> was rejected.
-   Pattern Matching of functional languages
-   String Interpolation, but PEP 498 was merged into Python 3.6.

MacroPy has a long list of examples and use cases.

This PEP does not add any new code transformer. Using a code transformer
will require an external module and to register it manually.

See also PyXfuscator: Python obfuscator, deobfuscator, and user-assisted
decompiler.

Usage 3: Disable all optimization

Ned Batchelder asked to add an option to disable the peephole optimizer
because it makes code coverage more difficult to implement. See the
discussion on the python-ideas mailing list: Disable all peephole
optimizations.

This PEP adds a new -o noopt command line option to disable the peephole
optimizer. In Python, it's as easy as:

    sys.set_code_transformers([])

It will fix the Issue #2506: Add mechanism to disable optimizations.

Usage 4: Write new bytecode optimizers in Python

Python 3.6 optimizes the code using a peephole optimizer. By definition,
a peephole optimizer has a narrow view of the code and so can only
implement basic optimizations. The optimizer rewrites the bytecode. It
is difficult to enhance it, because it written in C.

With this PEP, it becomes possible to implement a new bytecode optimizer
in pure Python and experiment new optimizations.

Some optimizations are easier to implement on the AST like constant
folding, but optimizations on the bytecode are still useful. For
example, when the AST is compiled to bytecode, useless jumps can be
emitted because the compiler is naive and does not try to optimize
anything.

Use Cases

This section give examples of use cases explaining when and how code
transformers will be used.

Interactive interpreter

It will be possible to use code transformers with the interactive
interpreter which is popular in Python and commonly used to demonstrate
Python.

The code is transformed at runtime and so the interpreter can be slower
when expensive code transformers are used.

Build a transformed package

It will be possible to build a package of the transformed code.

A transformer can have a configuration. The configuration is not stored
in the package.

All .pyc files of the package must be transformed with the same code
transformers and the same transformers configuration.

It is possible to build different .pyc files using different optimizer
tags. Example: fat for the default configuration and fat_inline for a
different configuration with function inlining enabled.

A package can contain .pyc files with different optimizer tags.

Install a package containing transformed .pyc files

It will be possible to install a package which contains transformed .pyc
files.

All .pyc files with any optimizer tag contained in the package are
installed, not only for the current optimizer tag.

Build .pyc files when installing a package

If a package does not contain any .pyc files of the current optimizer
tag (or some .pyc files are missing), the .pyc are created during the
installation.

Code transformers of the optimizer tag are required. Otherwise, the
installation fails with an error.

Execute transformed code

It will be possible to execute transformed code.

Raise an ImportError exception on import if the .pyc file of the current
optimizer tag is missing and the code transformers required to transform
the code are missing.

The interesting point here is that code transformers are not needed to
execute the transformed code if all required .pyc files are already
available.

Code transformer API

A code transformer is a class with ast_transformer() and/or
code_transformer() methods (API described below) and a name attribute.

For efficiency, do not define a code_transformer() or ast_transformer()
method if it does nothing.

The name attribute (str) must be a short string used to identify an
optimizer. It is used to build a .pyc filename. The name must not
contain dots ('.'), dashes ('-') or directory separators: dots are used
to separated fields in a .pyc filename and dashes areused to join code
transformer names to build the optimizer tag.

Note

It would be nice to pass the fully qualified name of a module in the
context when an AST transformer is used to transform a module on import,
but it looks like the information is not available in
PyParser_ASTFromStringObject().

code_transformer() method

Prototype:

    def code_transformer(self, code, context):
        ...
        new_code = ...
        ...
        return new_code

Parameters:

-   code: code object
-   context: an object with an optimize attribute (int), the
    optimization level (0, 1 or 2). The value of the optimize attribute
    comes from the optimize parameter of the compile() function, it is
    equal to sys.flags.optimize by default.

Each implementation of Python can add extra attributes to context. For
example, on CPython, context will also have the following attribute:

-   interactive (bool): true if in interactive mode

XXX add more flags?

XXX replace flags int with a sub-namespace, or with specific attributes?

The method must return a code object.

The code transformer is run after the compilation to bytecode

ast_transformer() method

Prototype:

    def ast_transformer(self, tree, context):
        ...
        return tree

Parameters:

-   tree: an AST tree
-   context: an object with a filename attribute (str)

It must return an AST tree. It can modify the AST tree in place, or
create a new AST tree.

The AST transformer is called after the creation of the AST by the
parser and before the compilation to bytecode. New attributes may be
added to context in the future.

Changes

In short, add:

-   -o OPTIM_TAG command line option
-   sys.implementation.optim_tag
-   sys.get_code_transformers()
-   sys.set_code_transformers(transformers)
-   ast.PyCF_TRANSFORMED_AST

API to get/set code transformers

Add new functions to register code transformers:

-   sys.set_code_transformers(transformers): set the list of code
    transformers and update sys.implementation.optim_tag
-   sys.get_code_transformers(): get the list of code transformers.

The order of code transformers matter. Running transformer A and then
transformer B can give a different output than running transformer B an
then transformer A.

Example to prepend a new code transformer:

    transformers = sys.get_code_transformers()
    transformers.insert(0, new_cool_transformer)
    sys.set_code_transformers(transformers)

All AST transformers are run sequentially (ex: the second transformer
gets the input of the first transformer), and then all bytecode
transformers are run sequentially.

Optimizer tag

Changes:

-   Add sys.implementation.optim_tag (str): optimization tag. The
    default optimization tag is 'opt'.
-   Add a new -o OPTIM_TAG command line option to set
    sys.implementation.optim_tag.

Changes on importlib:

-   importlib uses sys.implementation.optim_tag to build the .pyc
    filename to importing modules, instead of always using opt. Remove
    also the special case for the optimizer level 0 with the default
    optimizer tag 'opt' to simplify the code.
-   When loading a module, if the .pyc file is missing but the .py is
    available, the .py is only used if code optimizers have the same
    optimizer tag than the current tag, otherwise an ImportError
    exception is raised.

Pseudo-code of a use_py() function to decide if a .py file can be
compiled to import a module:

    def transformers_tag():
        transformers = sys.get_code_transformers()
        if not transformers:
            return 'noopt'
        return '-'.join(transformer.name
                        for transformer in transformers)

    def use_py():
        return (transformers_tag() == sys.implementation.optim_tag)

The order of sys.get_code_transformers() matter. For example, the fat
transformer followed by the pythran transformer gives the optimizer tag
fat-pythran.

The behaviour of the importlib module is unchanged with the default
optimizer tag ('opt').

Peephole optimizer

By default, sys.implementation.optim_tag is opt and
sys.get_code_transformers() returns a list of one code transformer: the
peephole optimizer (optimize the bytecode).

Use -o noopt to disable the peephole optimizer. In this case, the
optimizer tag is noopt and no code transformer is registered.

Using the -o opt option has not effect.

AST enhancements

Enhancements to simplify the implementation of AST transformers:

-   Add a new compiler flag PyCF_TRANSFORMED_AST to get the transformed
    AST. PyCF_ONLY_AST returns the AST before the transformers.

Examples

.pyc filenames

Example of .pyc filenames of the os module.

With the default optimizer tag 'opt':

+-------------------------+--------------------+
| .pyc filename           | Optimization level |
+=========================+====================+
| os.cpython-36.opt-0.pyc |   0                |
+-------------------------+--------------------+
| os.cpython-36.opt-1.pyc |   1                |
+-------------------------+--------------------+
| os.cpython-36.opt-2.pyc |   2                |
+-------------------------+--------------------+

With the 'fat' optimizer tag:

+-------------------------+--------------------+
| .pyc filename           | Optimization level |
+=========================+====================+
| os.cpython-36.fat-0.pyc |   0                |
+-------------------------+--------------------+
| os.cpython-36.fat-1.pyc |   1                |
+-------------------------+--------------------+
| os.cpython-36.fat-2.pyc |   2                |
+-------------------------+--------------------+

Bytecode transformer

Scary bytecode transformer replacing all strings with "Ni! Ni! Ni!":

    import sys
    import types

    class BytecodeTransformer:
        name = "knights_who_say_ni"

        def code_transformer(self, code, context):
            consts = ['Ni! Ni! Ni!' if isinstance(const, str) else const
                      for const in code.co_consts]
            return types.CodeType(code.co_argcount,
                                  code.co_kwonlyargcount,
                                  code.co_nlocals,
                                  code.co_stacksize,
                                  code.co_flags,
                                  code.co_code,
                                  tuple(consts),
                                  code.co_names,
                                  code.co_varnames,
                                  code.co_filename,
                                  code.co_name,
                                  code.co_firstlineno,
                                  code.co_lnotab,
                                  code.co_freevars,
                                  code.co_cellvars)

    # replace existing code transformers with the new bytecode transformer
    sys.set_code_transformers([BytecodeTransformer()])

    # execute code which will be transformed by code_transformer()
    exec("print('Hello World!')")

Output:

    Ni! Ni! Ni!

AST transformer

Similarly to the bytecode transformer example, the AST transformer also
replaces all strings with "Ni! Ni! Ni!":

    import ast
    import sys

    class KnightsWhoSayNi(ast.NodeTransformer):
        def visit_Str(self, node):
            node.s = 'Ni! Ni! Ni!'
            return node

    class ASTTransformer:
        name = "knights_who_say_ni"

        def __init__(self):
            self.transformer = KnightsWhoSayNi()

        def ast_transformer(self, tree, context):
            self.transformer.visit(tree)
            return tree

    # replace existing code transformers with the new AST transformer
    sys.set_code_transformers([ASTTransformer()])

    # execute code which will be transformed by ast_transformer()
    exec("print('Hello World!')")

Output:

    Ni! Ni! Ni!

Other Python implementations

The PEP 511 should be implemented by all Python implementation, but the
bytecode and the AST are not standardized.

By the way, even between minor version of CPython, there are changes on
the AST API. There are differences, but only minor differences. It is
quite easy to write an AST transformer which works on Python 2.7 and
Python 3.5 for example.

Discussion

-   [Python-ideas] PEP 511: API for code transformers (January 2016)
-   [Python-Dev] AST optimizer implemented in Python (August 2012)

Prior Art

AST optimizers

The Issue #17515 "Add sys.setasthook() to allow to use a custom AST"
optimizer was a first attempt of API for code transformers, but specific
to AST.

In 2015, Victor Stinner wrote the fatoptimizer project, an AST optimizer
specializing functions using guards.

In 2014, Kevin Conway created the PyCC optimizer.

In 2012, Victor Stinner wrote the astoptimizer project, an AST optimizer
implementing various optimizations. Most interesting optimizations break
the Python semantics since no guard is used to disable optimization if
something changes.

In 2011, Eugene Toder proposed to rewrite some peephole optimizations in
a new AST optimizer: issue #11549, Build-out an AST optimizer, moving
some functionality out of the peephole optimizer. The patch adds ast.Lit
(it was proposed to rename it to ast.Literal).

Python Preprocessors

-   MacroPy: MacroPy is an implementation of Syntactic Macros in the
    Python Programming Language. MacroPy provides a mechanism for
    user-defined functions (macros) to perform transformations on the
    abstract syntax tree (AST) of a Python program at import time.
-   pypreprocessor: C-style preprocessor directives in Python, like
    #define and #ifdef

Bytecode transformers

-   codetransformer: Bytecode transformers for CPython inspired by the
    ast module’s NodeTransformer.
-   byteplay: Byteplay lets you convert Python code objects into
    equivalent objects which are easy to play with, and lets you convert
    those objects back into living Python code objects. It's useful for
    applying crazy transformations on Python functions, and is also
    useful in learning Python byte code intricacies. See byteplay
    documentation.

See also:

-   BytecodeAssembler

Copyright

This document has been placed in the public domain.