PEP: 436 Title: The Argument Clinic DSL Version: $Revision$
Last-Modified: $Date$ Author: Larry Hastings <larry@hastings.org>
Discussions-To: python-dev@python.org Status: Final Type: Standards
Track Content-Type: text/x-rst Created: 22-Feb-2013 Python-Version: 3.4

Abstract

This document proposes "Argument Clinic", a DSL to facilitate argument
processing for built-in functions in the implementation of CPython.

Rationale and Goals

The primary implementation of Python, "CPython", is written in a mixture
of Python and C. One implementation detail of CPython is what are called
"built-in" functions -- functions available to Python programs but
written in C. When a Python program calls a built-in function and passes
in arguments, those arguments must be translated from Python values into
C values. This process is called "parsing arguments".

As of CPython 3.3, builtin functions nearly always parse their arguments
with one of two functions: the original PyArg_ParseTuple(),[1] and the
more modern PyArg_ParseTupleAndKeywords().[2] The former only handles
positional parameters; the latter also accommodates keyword and
keyword-only parameters, and is preferred for new code.

With either function, the caller specifies the translation for parsing
arguments in a "format string":[3] each parameter corresponds to a
"format unit", a short character sequence telling the parsing function
what Python types to accept and how to translate them into the
appropriate C value for that parameter.

PyArg_ParseTuple() was reasonable when it was first conceived. There
were only a dozen or so of these "format units"; each one was distinct,
and easy to understand and remember. But over the years the PyArg_Parse
interface has been extended in numerous ways. The modern API is complex,
to the point that it is somewhat painful to use. Consider:

-   There are now forty different "format units"; a few are even three
    characters long. This makes it difficult for the programmer to
    understand what the format string says--or even perhaps to parse
    it--without constantly cross-indexing it with the documentation.
-   There are also six meta-format units that may be buried in the
    format string. (They are: "()|$:;".)
-   The more format units are added, the less likely it is the
    implementer can pick an easy-to-use mnemonic for the format unit,
    because the character of choice is probably already in use. In other
    words, the more format units we have, the more obtuse the format
    units become.
-   Several format units are nearly identical to others, having only
    subtle differences. This makes understanding the exact semantics of
    the format string even harder, and can make it difficult to figure
    out exactly which format unit you want.
-   The docstring is specified as a static C string, making it mildly
    bothersome to read and edit since it must obey C string quoting
    rules.
-   When adding a new parameter to a function using
    PyArg_ParseTupleAndKeywords(), it's necessary to touch six different
    places in the code:[4]
    -   Declaring the variable to store the argument.
    -   Passing in a pointer to that variable in the correct spot in
        PyArg_ParseTupleAndKeywords(), also passing in any "length" or
        "converter" arguments in the correct order.
    -   Adding the name of the argument in the correct spot of the
        "keywords" array passed in to PyArg_ParseTupleAndKeywords().
    -   Adding the format unit to the correct spot in the format string.
    -   Adding the parameter to the prototype in the docstring.
    -   Documenting the parameter in the docstring.
-   There is currently no mechanism for builtin functions to provide
    their "signature" information (see inspect.getfullargspec and
    inspect.Signature). Adding this information using a mechanism
    similar to the existing PyArg_Parse functions would require
    repeating ourselves yet again.

The goal of Argument Clinic is to replace this API with a mechanism
inheriting none of these downsides:

-   You need specify each parameter only once.
-   All information about a parameter is kept together in one place.
-   For each parameter, you specify a conversion function; Argument
    Clinic handles the translation from Python value into C value for
    you.
-   Argument Clinic also allows for fine-tuning of argument processing
    behavior with parameterized conversion functions.
-   Docstrings are written in plain text. Function docstrings are
    required; per-parameter docstrings are encouraged.
-   From this, Argument Clinic generates for you all the mundane,
    repetitious code and data structures CPython needs internally. Once
    you've specified the interface, the next step is simply to write
    your implementation using native C types. Every detail of argument
    parsing is handled for you.

Argument Clinic is implemented as a preprocessor. It draws inspiration
for its workflow directly from [Cog] by Ned Batchelder. To use Clinic,
add a block comment to your C source code beginning and ending with
special text strings, then run Clinic on the file. Clinic will find the
block comment, process the contents, and write the output back into your
C source file directly after the comment. The intent is that Clinic's
output becomes part of your source code; it's checked in to revision
control, and distributed with source packages. This means that Python
will still ship ready-to-build. It does complicate development slightly;
in order to add a new function, or modify the arguments or documentation
of an existing function using Clinic, you'll need a working Python 3
interpreter.

Future goals of Argument Clinic include:

-   providing signature information for builtins,
-   enabling alternative implementations of Python to create automated
    library compatibility tests, and
-   speeding up argument parsing with improvements to the generated
    code.

DSL Syntax Summary

The Argument Clinic DSL is specified as a comment embedded in a C file,
as follows. The "Example" column on the right shows you sample input to
the Argument Clinic DSL, and the "Section" column on the left specifies
what each line represents in turn.

Argument Clinic's DSL syntax mirrors the Python def statement, lending
it some familiarity to Python core developers.

    +-----------------------+-----------------------------------------------------------------+
    | Section               | Example                                                         |
    +-----------------------+-----------------------------------------------------------------+
    | Clinic DSL start      | /*[clinic]                                                      |
    | Module declaration    | module module_name                                              |
    | Class declaration     | class module_name.class_name                                    |
    | Function declaration  | module_name.function_name  -> return_annotation                 |
    | Parameter declaration |       name : converter(param=value)                             |
    | Parameter docstring   |           Lorem ipsum dolor sit amet, consectetur               |
    |                       |           adipisicing elit, sed do eiusmod tempor               |
    | Function docstring    | Lorem ipsum dolor sit amet, consectetur adipisicing             |
    |                       | elit, sed do eiusmod tempor incididunt ut labore et             |
    | Clinic DSL end        | [clinic]*/                                                      |
    | Clinic output         | ...                                                             |
    | Clinic output end     | /*[clinic end output:<checksum>]*/                              |
    +-----------------------+-----------------------------------------------------------------+

To give some flavor of the proposed DSL syntax, here are some sample
Clinic code blocks. This first block reflects the normally preferred
style, including blank lines between parameters and per-argument
docstrings. It also includes a user-defined converter (path_t) created
locally:

    /*[clinic]
    os.stat as os_stat_fn -> stat result

       path: path_t(allow_fd=1)
           Path to be examined; can be string, bytes, or open-file-descriptor int.

       *

       dir_fd: OS_STAT_DIR_FD_CONVERTER = DEFAULT_DIR_FD
           If not None, it should be a file descriptor open to a directory,
           and path should be a relative string; path will then be relative to
           that directory.

       follow_symlinks: bool = True
           If False, and the last element of the path is a symbolic link,
           stat will examine the symbolic link itself instead of the file
           the link points to.

    Perform a stat system call on the given path.

    {parameters}

    dir_fd and follow_symlinks may not be implemented
      on your platform.  If they are unavailable, using them will raise a
      NotImplementedError.

    It's an error to use dir_fd or follow_symlinks when specifying path as
      an open file descriptor.

    [clinic]*/

This second example shows a minimal Clinic code block, omitting all
parameter docstrings and non-significant blank lines:

    /*[clinic]
    os.access
       path: path
       mode: int
       *
       dir_fd: OS_ACCESS_DIR_FD_CONVERTER = 1
       effective_ids: bool = False
       follow_symlinks: bool = True
    Use the real uid/gid to test for access to a path.
    Returns True if granted, False otherwise.

    {parameters}

    dir_fd, effective_ids, and follow_symlinks may not be implemented
      on your platform.  If they are unavailable, using them will raise a
      NotImplementedError.

    Note that most operations will use the effective uid/gid, therefore this
      routine can be used in a suid/sgid environment to test if the invoking user
      has the specified access to the path.

    [clinic]*/

This final example shows a Clinic code block handling groups of optional
parameters, including parameters on the left:

    /*[clinic]
    curses.window.addch

       [
       y: int
         Y-coordinate.

       x: int
         X-coordinate.
       ]

       ch: char
         Character to add.

       [
       attr: long
         Attributes for the character.
       ]

       /

    Paint character ch at (y, x) with attributes attr,
    overwriting any character previously painter at that location.
    By default, the character position and attributes are the
    current settings for the window object.
    [clinic]*/

General Behavior Of the Argument Clinic DSL

All lines support # as a line comment delimiter except docstrings. Blank
lines are always ignored.

Like Python itself, leading whitespace is significant in the Argument
Clinic DSL. The first line of the "function" section is the function
declaration. Indented lines below the function declaration declare
parameters, one per line; lines below those that are indented even
further are per-parameter docstrings. Finally, the first line dedented
back to column 0 end parameter declarations and start the function
docstring.

Parameter docstrings are optional; function docstrings are not.
Functions that specify no arguments may simply specify the function
declaration followed by the docstring.

Module and Class Declarations

When a C file implements a module or class, this should be declared to
Clinic. The syntax is simple:

    module module_name

or :

    class module_name.class_name

(Note that these are not actually special syntax; they are implemented
as Directives.)

The module name or class name should always be the full dotted path from
the top-level module. Nested modules and classes are supported.

Function Declaration

The full form of the function declaration is as follows:

    dotted.name [ as legal_c_id ] [ -> return_annotation ]

The dotted name should be the full name of the function, starting with
the highest-level package (e.g. "os.stat" or "curses.window.addch").

The "as legal_c_id" syntax is optional. Argument Clinic uses the name of
the function to create the names of the generated C functions. In some
circumstances, the generated name may collide with other global names in
the C program's namespace. The "as legal_c_id" syntax allows you to
override the generated name with your own; substitute "legal_c_id" with
any legal C identifier. If skipped, the "as" keyword must also be
omitted.

The return annotation is also optional. If skipped, the arrow ("->")
must also be omitted. If specified, the value for the return annotation
must be compatible with ast.literal_eval, and it is interpreted as a
return converter.

Parameter Declaration

The full form of the parameter declaration line as follows:

    name: converter [ (parameter=value [, parameter2=value2]) ] [ = default]

The "name" must be a legal C identifier. Whitespace is permitted between
the name and the colon (though this is not the preferred style).
Whitespace is permitted (and encouraged) between the colon and the
converter.

The "converter" is the name of one of the "converter functions"
registered with Argument Clinic. Clinic will ship with a number of
built-in converters; new converters can also be added dynamically. In
choosing a converter, you are automatically constraining what Python
types are permitted on the input, and specifying what type the output
variable (or variables) will be. Although many of the converters will
resemble the names of C types or perhaps Python types, the name of a
converter may be any legal Python identifier.

If the converter is followed by parentheses, these parentheses enclose
parameter to the conversion function. The syntax mirrors providing
arguments a Python function call: the parameter must always be named, as
if they were "keyword-only parameters", and the values provided for the
parameters will syntactically resemble Python literal values. These
parameters are always optional, permitting all conversion functions to
be called without any parameters. In this case, you may also omit the
parentheses entirely; this is always equivalent to specifying empty
parentheses. The values supplied for these parameters must be compatible
with ast.literal_eval.

The "default" is a Python literal value. Default values are optional; if
not specified you must omit the equals sign too. Parameters which don't
have a default are implicitly required. The default value is dynamically
assigned, "live" in the generated C code, and although it's specified as
a Python value, it's translated into a native C value in the generated C
code. Few default values are permitted, owing to this manual translation
step.

If this were a Python function declaration, a parameter declaration
would be delimited by either a trailing comma or an ending parenthesis.
However, Argument Clinic uses neither; parameter declarations are
delimited by a newline. A trailing comma or right parenthesis is not
permitted.

The first parameter declaration establishes the indent for all parameter
declarations in a particular Clinic code block. All subsequent
parameters must be indented to the same level.

Legacy Converters

For convenience's sake in converting existing code to Argument Clinic,
Clinic provides a set of legacy converters that match PyArg_ParseTuple
format units. They are specified as a C string containing the format
unit. For example, to specify a parameter "foo" as taking a Python "int"
and emitting a C int, you could specify:

    foo : "i"

(To more closely resemble a C string, these must always use double
quotes.)

Although these resemble PyArg_ParseTuple format units, no guarantee is
made that the implementation will call a PyArg_Parse function for
parsing.

This syntax does not support parameters. Therefore, it doesn't support
any of the format units that require input parameters
("O!", "O&", "es", "es#", "et", "et#"). Parameters requiring one of
these conversions cannot use the legacy syntax. (You may still, however,
supply a default value.)

Parameter Docstrings

All lines that appear below and are indented further than a parameter
declaration are the docstring for that parameter. All such lines are
"dedented" until the first line is flush left.

Special Syntax For Parameter Lines

There are four special symbols that may be used in the parameter
section. Each of these must appear on a line by itself, indented to the
same level as parameter declarations. The four symbols are:

*

    Establishes that all subsequent parameters are keyword-only.

[

    Establishes the start of an optional "group" of parameters. Note
    that "groups" may nest inside other "groups". See Functions With
    Positional-Only Parameters below. Note that currently [ is only
    legal for use in functions where all parameters are marked
    positional-only, see / below.

]

    Ends an optional "group" of parameters.

/

    Establishes that all the proceeding arguments are positional-only.
    For now, Argument Clinic does not support functions with both
    positional-only and non-positional-only arguments. Therefore: if /
    is specified for a function, it must currently always be after the
    last parameter. Also, Argument Clinic does not currently support
    default values for positional-only parameters.

(The semantics of / follow a syntax for positional-only parameters in
Python once proposed by Guido.[5] )

Function Docstring

The first line with no leading whitespace after the function declaration
is the first line of the function docstring. All subsequent lines of the
Clinic block are considered part of the docstring, and their leading
whitespace is preserved.

If the string {parameters} appears on a line by itself inside the
function docstring, Argument Clinic will insert a list of all parameters
that have docstrings, each such parameter followed by its docstring. The
name of the parameter is on a line by itself; the docstring starts on a
subsequent line, and all lines of the docstring are indented by two
spaces. (Parameters with no per-parameter docstring are suppressed.) The
entire list is indented by the leading whitespace that appeared before
the {parameters} token.

If the string {parameters} doesn't appear in the docstring, Argument
Clinic will append one to the end of the docstring, inserting a blank
line above it if the docstring does not end with a blank line, and with
the parameter list at column 0.

Converters

Argument Clinic contains a pre-initialized registry of converter
functions. Example converter functions:

int

    Accepts a Python object implementing __int__; emits a C int.

byte

    Accepts a Python int; emits an unsigned char. The integer must be in
    the range [0, 256).

str

    Accepts a Python str object; emits a C char *. Automatically encodes
    the string using the ascii codec.

PyObject

    Accepts any object; emits a C PyObject * without any conversion.

All converters accept the following parameters:

doc_default

    The Python value to use in place of the parameter's actual default
    in Python contexts. In other words: when specified, this value will
    be used for the parameter's default in the docstring, and in the
    Signature. (TBD alternative semantics: If the string is a valid
    Python expression which can be rendered into a Python value using
    eval(), then the result of eval() on it will be used as the default
    in the Signature.) Ignored if there is no default.

required

    Normally any parameter that has a default value is automatically
    optional. A parameter that has "required" set will be considered
    required (non-optional) even if it has a default value. The
    generated documentation will also not show any default value.

Additionally, converters may accept one or more of these optional
parameters, on an individual basis:

annotation

    Explicitly specifies the per-parameter annotation for this
    parameter. Normally it's the responsibility of the conversion
    function to generate the annotation (if any).

bitwise

    For converters that accept unsigned integers. If the Python integer
    passed in is signed, copy the bits directly even if it is negative.

encoding

    For converters that accept str. Encoding to use when encoding a
    Unicode string to a char *.

immutable

    Only accept immutable values.

length

    For converters that accept iterable types. Requests that the
    converter also emit the length of the iterable, passed in to the
    _impl function in a Py_ssize_t variable; its name will be this
    parameter's name appended with "_length".

nullable

    This converter normally does not accept None, but in this case it
    should. If None is supplied on the Python side, the equivalent C
    argument will be NULL. (The _impl argument emitted by this converter
    will presumably be a pointer type.)

types

    A list of strings representing acceptable Python types for this
    object. There are also four strings which represent Python
    protocols:

    -   "buffer"
    -   "mapping"
    -   "number"
    -   "sequence"

zeroes

    For converters that accept string types. The converted value should
    be allowed to have embedded zeroes.

Return Converters

A return converter conceptually performs the inverse operation of a
converter: it converts a native C value into its equivalent Python
value.

Directives

Argument Clinic also permits "directives" in Clinic code blocks.
Directives are similar to pragmas in C; they are statements that modify
Argument Clinic's behavior.

The format of a directive is as follows:

    directive_name [argument [second_argument [ ... ]]]

Directives only take positional arguments.

A Clinic code block must contain either one or more directives, or a
function declaration. It may contain both, in which case all directives
must come before the function declaration.

Internally directives map directly to Python callables. The directive's
arguments are passed directly to the callable as positional arguments of
type str().

Example possible directives include the production, suppression, or
redirection of Clinic output. Also, the "module" and "class" keywords
are implemented as directives in the prototype.

Python Code

Argument Clinic also permits embedding Python code inside C files, which
is executed in-place when Argument Clinic processes the file. Embedded
code looks like this:

    /*[python]

    # this is python code!
    print("/" + "* Hello world! *" + "/")

    [python]*/
    /* Hello world! */
    /*[python end:da39a3ee5e6b4b0d3255bfef95601890afd80709]*/

The "/* Hello world! */" line above was generated by running the Python
code in the preceding comment.

Any Python code is valid. Python code sections in Argument Clinic can
also be used to directly interact with Clinic; see Argument Clinic
Programmatic Interfaces.

Output

Argument Clinic writes its output inline in the C file, immediately
after the section of Clinic code. For "python" sections, the output is
everything printed using builtins.print. For "clinic" sections, the
output is valid C code, including:

-   a #define providing the correct methoddef structure for the function
-   a prototype for the "impl" function -- this is what you'll write to
    implement this function
-   a function that handles all argument processing, which calls your
    "impl" function
-   the definition line of the "impl" function
-   and a comment indicating the end of output.

The intention is that you write the body of your impl function
immediately after the output -- as in, you write a left-curly-brace
immediately after the end-of-output comment and implement builtin in the
body there. (It's a bit strange at first, but oddly convenient.)

Argument Clinic will define the parameters of the impl function for you.
The function will take the "self" parameter passed in originally, all
the parameters you define, and possibly some extra generated parameters
("length" parameters; also "group" parameters, see next section).

Argument Clinic also writes a checksum for the output section. This is a
valuable safety feature: if you modify the output by hand, Clinic will
notice that the checksum doesn't match, and will refuse to overwrite the
file. (You can force Clinic to overwrite with the "-f" command-line
argument; Clinic will also ignore the checksums when using the "-o"
command-line argument.)

Finally, Argument Clinic can also emit the boilerplate definition of the
PyMethodDef array for the defined classes and modules.

Functions With Positional-Only Parameters

A significant fraction of Python builtins implemented in C use the older
positional-only API for processing arguments (PyArg_ParseTuple()). In
some instances, these builtins parse their arguments differently based
on how many arguments were passed in. This can provide some bewildering
flexibility: there may be groups of optional parameters, which must
either all be specified or none specified. And occasionally these groups
are on the left! (A representative example: curses.window.addch().)

Argument Clinic supports these legacy use-cases by allowing you to
specify parameters in groups. Each optional group of parameters is
marked with square brackets. Note that these groups are permitted on the
right or left of any required parameters!

The impl function generated by Clinic will add an extra parameter for
every group, "int group_{left|right}_<x>", where x is a monotonically
increasing number assigned to each group as it builds away from the
required arguments. This argument will be nonzero if the group was
specified on this call, and zero if it was not.

Note that when operating in this mode, you cannot specify default
arguments.

Also, note that it's possible to specify a set of groups to a function
such that there are several valid mappings from the number of arguments
to a valid set of groups. If this happens, Clinic will abort with an
error message. This should not be a problem, as positional-only
operation is only intended for legacy use cases, and all the legacy
functions using this quirky behavior have unambiguous mappings.

Current Status

As of this writing, there is a working prototype implementation of
Argument Clinic available online (though the syntax may be out of date
as you read this).[6] The prototype generates code using the existing
PyArg_Parse APIs. It supports translating to all current format units
except the mysterious "w*". Sample functions using Argument Clinic
exercise all major features, including positional-only argument parsing.

Argument Clinic Programmatic Interfaces

The prototype also currently provides an experimental extension
mechanism, allowing adding support for new types on-the-fly. See
Modules/posixmodule.c in the prototype for an example of its use.

In the future, Argument Clinic is expected to be automatable enough to
allow querying, modification, or outright new construction of function
declarations through Python code. It may even permit dynamically adding
your own custom DSL!

Notes / TBD

-   The API for supplying inspect.Signature metadata for builtins is
    currently under discussion. Argument Clinic will add support for the
    prototype when it becomes viable.

-   Alyssa Coghlan suggests that we a) only support at most one
    left-optional group per function, and b) in the face of ambiguity,
    prefer the left group over the right group. This would solve all our
    existing use cases including range().

-   Optimally we'd want Argument Clinic run automatically as part of the
    normal Python build process. But this presents a bootstrapping
    problem; if you don't have a system Python 3, you need a Python 3
    executable to build Python 3. I'm sure this is a solvable problem,
    but I don't know what the best solution might be. (Supporting this
    will also require a parallel solution for Windows.)

-   On a related note: inspect.Signature has no way of representing
    blocks of arguments, like the left-optional block of y and x for
    curses.window.addch. How far are we going to go in supporting this
    admittedly aberrant parameter paradigm?

-   During the PyCon US 2013 Language Summit, there was discussion of
    having Argument Clinic also generate the actual documentation (in
    ReST, processed by Sphinx) for the function. The logistics of this
    are TBD, but it would require that the docstrings be written in
    ReST, and require that Python ship a ReST -> ascii converter. It
    would be best to come to a decision about this before we begin any
    large-scale conversion of the CPython source tree to using Clinic.

-   Guido proposed having the "function docstring" be hand-written
    inline, in the middle of the output, something like this:

        /*[clinic]
          ... prototype and parameters (including parameter docstrings) go here
        [clinic]*/
        ... some output ...
        /*[clinic docstring start]*/
        ... hand-edited function docstring goes here   <-- you edit this by hand!
        /*[clinic docstring end]*/
        ... more output
        /*[clinic output end]*/

    I tried it this way and don't like it -- I think it's clumsy. I
    prefer that everything you write goes in one place, rather than
    having an island of hand-edited stuff in the middle of the DSL
    output.

-   Argument Clinic does not support automatic tuple unpacking (the
    "(OOO)" style format string for PyArg_ParseTuple().)

-   Argument Clinic removes some dynamism / flexibility. With
    PyArg_ParseTuple() one could theoretically pass in different
    encodings at runtime for the "es"/"et" format units. AFAICT CPython
    doesn't do this itself, however it's possible external users might
    do this. (Trivia: there are no uses of "es" exercised by regrtest,
    and all the uses of "et" exercised are in socketmodule.c, except for
    one in _ssl.c. They're all static, specifying the encoding "idna".)

Acknowledgements

The PEP author wishes to thank Ned Batchelder for permission to
shamelessly rip off his clever design for Cog--"my favorite tool that
I've never gotten to use". Thanks also to everyone who provided feedback
on the [bugtracker issue] and on python-dev. Special thanks to Alyssa
(Nick) Coghlan and Guido van Rossum for a rousing two-hour in-person
deep dive on the topic at PyCon US 2013.

References

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

Cog

    Cog: http://nedbatchelder.com/code/cog/

[1] PyArg_ParseTuple():
http://docs.python.org/3/c-api/arg.html#PyArg_ParseTuple

[2] PyArg_ParseTupleAndKeywords():
http://docs.python.org/3/c-api/arg.html#PyArg_ParseTupleAndKeywords

[3] PyArg_ format units:
http://docs.python.org/3/c-api/arg.html#strings-and-buffers

[4] Keyword parameters for extension functions:
http://docs.python.org/3/extending/extending.html#keyword-parameters-for-extension-functions

[5] Guido van Rossum, posting to python-ideas, March 2012:
https://mail.python.org/pipermail/python-ideas/2012-March/014364.html
and
https://mail.python.org/pipermail/python-ideas/2012-March/014378.html
and
https://mail.python.org/pipermail/python-ideas/2012-March/014417.html

[6] Argument Clinic prototype:
https://bitbucket.org/larry/python-clinic/