PEP: 727 Title: Documentation in Annotated Metadata Author: Sebastián
Ramírez <tiangolo@gmail.com> Sponsor: Jelle Zijlstra
<jelle.zijlstra@gmail.com> Discussions-To:
https://discuss.python.org/t/32566 Status: Draft Type: Standards Track
Topic: Typing Content-Type: text/x-rst Created: 28-Aug-2023
Python-Version: 3.13 Post-History: 30-Aug-2023

Abstract

This PEP proposes a standardized way to provide documentation strings
for Python symbols defined with ~typing.Annotated using a new class
typing.Doc.

Motivation

There's already a well-defined way to provide documentation for classes,
functions, class methods, and modules: using docstrings.

Currently there is no formalized standard to provide documentation
strings for other types of symbols: parameters, return values,
class-scoped variables (class variables and instance variables), local
variables, and type aliases.

Nevertheless, to allow documenting most of these additional symbols,
several conventions have been created as microsyntaxes inside of
docstrings, and are currently commonly used: Sphinx, numpydoc, Google,
Keras, etc.

There are two scenarios in which these conventions would be supported by
tools: for authors, while editing the contents of the documentation
strings and for users, while rendering that content in some way (in
documentation sites, tooltips in editors, etc).

Because each of these conventions uses a microsyntax inside a string,
when editing those docstrings, editors can't easily provide support for
autocompletion, inline errors for broken syntax, etc. Any type of
editing support for these conventions would be on top of the support for
editing standard Python syntax.

When documenting parameters with current conventions, because the
docstring is in a different place in the code than the actual parameters
and it requires duplication of information (the parameter name) the
information about a parameter is easily in a place in the code quite far
away from the declaration of the actual parameter and it is disconnected
from it. This means it's easy to refactor a function, remove a
parameter, and forget to remove its docs. The same happens when adding a
new parameter: it's easy to forget to add the docstring for it.

And because of this same duplication of information (the parameter name)
editors and other tools need complex custom logic to check or ensure the
consistency of the parameters in the signature and in their docstring,
or they simply don't fully support that.

As these existing conventions are different types of microsyntaxes
inside of strings, robustly parsing them for rendering requires complex
logic that needs to be implemented by the tools supporting them.
Additionally, libraries and tools don't have a straightforward way to
obtain the documentation for each individual parameter or variable at
runtime, without depending on a specific docstring convention parser.
Accessing the parameter documentation strings at runtime would be
useful, for example, for testing the contents of each parameter's
documentation, to ensure consistency across several similar functions,
or to extract and expose that same parameter documentation in some other
way (e.g. an API with FastAPI, a CLI with Typer, etc).

Some of these previous formats tried to account for the lack of type
annotations in older Python versions by including typing information in
the docstrings (e.g. Sphinx, numpydoc) but now that information doesn't
need to be in docstrings as there is now an official
syntax for type annotations <484>.

Rationale

This proposal intends to address these shortcomings by extending and
complementing the information in docstrings, keeping backwards
compatibility with existing docstrings (it doesn't deprecate them), and
doing it in a way that leverages the Python language and structure, via
type annotations with ~typing.Annotated, and a new class Doc in typing.

The reason why this would belong in the standard Python library instead
of an external package is because although the implementation would be
quite trivial, the actual power and benefit from it would come from
being a standard, to facilitate its usage from library authors and to
provide a default way to document Python symbols using
~typing.Annotated. Some tool providers (at least VS Code and PyCharm)
have shown they would consider implementing support for this only if it
was a standard.

This doesn't deprecate current usage of docstrings, docstrings should be
considered the preferred documentation method when available (not
available in type aliases, parameters, etc). And docstrings would be
complemented by this proposal for documentation specific to the symbols
that can be declared with ~typing.Annotated (currently only covered by
the several available microsyntax conventions).

This should be relatively transparent to common developers (library
users) unless they manually open the source files from libraries
adopting it.

It should be considered opt-in for library authors who would like to
adopt it and they should be free to decide to use it or not.

It would be only useful for libraries that are willing to use optional
type hints.

Summary

Here's a short summary of the features of this proposal in contrast to
current conventions:

-   Editing would be already fully supported by default by any editor
    (current or future) supporting Python syntax, including syntax
    errors, syntax highlighting, etc.
-   Rendering would be relatively straightforward to implement by static
    tools (tools that don't need runtime execution), as the information
    can be extracted from the AST they normally already create.
-   Deduplication of information: the name of a parameter would be
    defined in a single place, not duplicated inside of a docstring.
-   Elimination of the possibility of having inconsistencies when
    removing a parameter or class variable and forgetting to remove its
    documentation.
-   Minimization of the probability of adding a new parameter or class
    variable and forgetting to add its documentation.
-   Elimination of the possibility of having inconsistencies between the
    name of a parameter in the signature and the name in the docstring
    when it is renamed.
-   Access to the documentation string for each symbol at runtime,
    including existing (older) Python versions.
-   A more formalized way to document other symbols, like type aliases,
    that could use ~typing.Annotated.
-   No microsyntax to learn for newcomers, it's just Python syntax.
-   Parameter documentation inheritance for functions captured by
    ~typing.ParamSpec.

Specification

The main proposal is to introduce a new class, typing.Doc. This class
should only be used within ~typing.Annotated annotations. It takes a
single positional-only string argument. It should be used to document
the intended meaning and use of the symbol declared using
~typing.Annotated.

For example:

    from typing import Annotated, Doc

    class User:
        name: Annotated[str, Doc("The user's name")]
        age: Annotated[int, Doc("The user's age")]

        ...

~typing.Annotated is normally used as a type annotation, in those cases,
any typing.Doc inside of it would document the symbol being annotated.

When ~typing.Annotated is used to declare a type alias, typing.Doc would
then document the type alias symbol.

For example:

    from typing import Annotated, Doc, TypeAlias

    from external_library import UserResolver

    CurrentUser: TypeAlias = Annotated[str, Doc("The current system user"), UserResolver()]

    def create_user(name: Annotated[str, Doc("The user's name")]): ...

    def delete_user(name: Annotated[str, Doc("The user to delete")]): ...

In this case, if a user imported CurrentUser, tools like editors could
provide a tooltip with the documentation string when a user hovers over
that symbol, or documentation tools could include the type alias with
its documentation in their generated output.

For tools extracting the information at runtime, they would normally use
~typing.get_type_hints with the parameter include_extras=True, and as
~typing.Annotated is normalized (even with type aliases), this would
mean they should use the last typing.Doc available, if more than one is
used, as that is the last one used.

At runtime, typing.Doc instances have an attribute documentation with
the string passed to it.

When a function's signature is captured by a ~typing.ParamSpec, any
documentation strings associated with the parameters should be retained.

Any tool processing typing.Doc objects should interpret the string as a
docstring, and therefore should normalize whitespace as if
inspect.cleandoc() were used.

The string passed to typing.Doc should be of the form that would be a
valid docstring. This means that f-strings and string operations should
not be used. As this cannot be enforced by the Python runtime, tools
should not rely on this behavior.

When tools providing rendering show the raw signature, they could allow
configuring if the whole raw ~typing.Annotated code should be displayed,
but they should default to not include ~typing.Annotated and its
internal code metadata, only the type of the symbols annotated. When
those tools support typing.Doc and rendering in other ways than just a
raw signature, they should show the string value passed to typing.Doc in
a convenient way that shows the relation between the documented symbol
and the documentation string.

Tools providing rendering could allow ways to configure where to show
the parameter documentation and the prose docstring in different ways.
Otherwise, they could simply show the prose docstring first and then the
parameter documentation second.

Examples

Class attributes may be documented:

    from typing import Annotated, Doc

    class User:
        name: Annotated[str, Doc("The user's name")]
        age: Annotated[int, Doc("The user's age")]

        ...

As can function or method parameters and return values:

    from typing import Annotated, Doc

    def create_user(
        name: Annotated[str, Doc("The user's name")],
        age: Annotated[int, Doc("The user's age")],
        cursor: DatabaseConnection | None = None,
    ) -> Annotated[User, Doc("The created user after saving in the database")]:
        """Create a new user in the system.

        It needs the database connection to be already initialized.
        """
        pass

Backwards Compatibility

This proposal is fully backwards compatible with existing code and it
doesn't deprecate existing usage of docstring conventions.

For developers that wish to adopt it before it is available in the
standard library, or to support older versions of Python, they can use
typing_extensions and import and use Doc from there.

For example:

    from typing import Annotated
    from typing_extensions import Doc

    class User:
        name: Annotated[str, Doc("The user's name")]
        age: Annotated[int, Doc("The user's age")]

        ...

Security Implications

There are no known security implications.

How to Teach This

The main mechanism of documentation should continue to be standard
docstrings for prose information, this applies to modules, classes,
functions and methods.

For authors that want to adopt this proposal to add more granularity,
they can use typing.Doc inside of ~typing.Annotated annotations for the
symbols that support it.

Library authors that wish to adopt this proposal while keeping backwards
compatibility with older versions of Python should use
typing_extensions.Doc instead of typing.Doc.

Reference Implementation

typing.Doc is implemented equivalently to:

    class Doc:
        def __init__(self, documentation: str, /):
            self.documentation = documentation

It has been implemented in the typing_extensions package.

Survey of Other languages

Here's a short survey of how other languages document their symbols.

Java

Java functions and their parameters are documented with Javadoc, a
special format for comments put on top of the function definition. This
would be similar to Python current docstring microsyntax conventions
(but only one).

For example:

    /**
    * Returns an Image object that can then be painted on the screen. 
    * The url argument must specify an absolute <a href="#{@link}">{@link URL}</a>. The name
    * argument is a specifier that is relative to the url argument. 
    * <p>
    * This method always returns immediately, whether or not the 
    * image exists. When this applet attempts to draw the image on
    * the screen, the data will be loaded. The graphics primitives 
    * that draw the image will incrementally paint on the screen. 
    *
    * @param  url  an absolute URL giving the base location of the image
    * @param  name the location of the image, relative to the url argument
    * @return      the image at the specified URL
    * @see         Image
    */
    public Image getImage(URL url, String name) {
      try {
        return getImage(new URL(url, name));
      } catch (MalformedURLException e) {
        return null;
      }
    }

JavaScript

Both JavaScript and TypeScript use a similar system to Javadoc.

JavaScript uses JSDoc.

For example:

    /**
    * Represents a book.
    * @constructor
    * @param {string} title - The title of the book.
    * @param {string} author - The author of the book.
    */
    function Book(title, author) {
    }

TypeScript

TypeScript has its own JSDoc reference with some variations.

For example:

    // Parameters may be declared in a variety of syntactic forms
    /**
    * @param {string}  p1 - A string param.
    * @param {string=} p2 - An optional param (Google Closure syntax)
    * @param {string} [p3] - Another optional param (JSDoc syntax).
    * @param {string} [p4="test"] - An optional param with a default value
    * @returns {string} This is the result
    */
    function stringsStringStrings(p1, p2, p3, p4) {
        // TODO
    }

Rust

Rust uses another similar variation of a microsyntax in Doc comments.

But it doesn't have a particular well defined microsyntax structure to
denote what documentation refers to what symbol/parameter other than
what can be inferred from the pure Markdown.

For example:

    #![crate_name = "doc"]

    /// A human being is represented here
    pub struct Person {
       /// A person must have a name, no matter how much Juliet may hate it
       name: String,
    }

    impl Person {
       /// Returns a person with the name given them
       ///
       /// # Arguments
       ///
       /// * `name` - A string slice that holds the name of the person
       ///
       /// # Examples
       ///
       /// ```
       /// // You can have rust code between fences inside the comments
       /// // If you pass --test to `rustdoc`, it will even test it for you!
       /// use doc::Person;
       /// let person = Person::new("name");
       /// ```
       pub fn new(name: &str) -> Person {
          Person {
                name: name.to_string(),
          }
       }

       /// Gives a friendly hello!
       ///
       /// Says "Hello, [name](Person::name)" to the `Person` it is called on.
       pub fn hello(& self) {
          println!("Hello, {}!", self.name);
       }
    }

    fn main() {
       let john = Person::new("John");

       john.hello();
    }

Go Lang

Go also uses a form of Doc Comments.

It doesn't have a well defined microsyntax structure to denote what
documentation refers to which symbol/parameter, but parameters can be
referenced by name without any special syntax or marker, this also means
that ordinary words that could appear in the documentation text should
be avoided as parameter names.

    package strconv

    // Quote returns a double-quoted Go string literal representing s.
    // The returned string uses Go escape sequences (\t, \n, \xFF, \u0100)
    // for control characters and non-printable characters as defined by IsPrint.
    func Quote(s string) string {
       ...
    }

Rejected Ideas

Standardize Current Docstrings

A possible alternative would be to support and try to push as a standard
one of the existing docstring formats. But that would only solve the
standardization.

It wouldn't solve any of the other problems derived from using a
microsyntax inside of a docstring instead of pure Python syntax, the
same as described above in the Rationale - Summary.

Extra Metadata and Decorator

Some ideas before this proposal included having a function doc() instead
of the single class Doc with several parameters to indicate whether an
object is discouraged from use, what exceptions it may raise, etc. To
allow also deprecating functions and classes, it was also expected that
doc() could be used as a decorator. But this functionality is covered by
typing.deprecated() in PEP 702, so it was dropped from this proposal.

A way to declare additional information could still be useful in the
future, but taking early feedback on this idea, all that was postponed
to future proposals.

This also shifted the focus from an all-encompassing function doc() with
multiple parameters to a single Doc class to be used in
~typing.Annotated in a way that could be composed with other future
proposals.

This design change also allows better interoperability with other
proposals like typing.deprecated(), as in the future it could be
considered to allow having typing.deprecated() also in ~typing.Annotated
to deprecate individual parameters, coexisting with Doc.

String Under Definition

A proposed alternative in the discussion is declaring a string under the
definition of a symbol and providing runtime access to those values:

    class User:
        name: str
        "The user's name"
        age: int
        "The user's age"

        ...

This was already proposed and rejected in PEP 224, mainly due to the
ambiguity of how is the string connected with the symbol it's
documenting.

Additionally, there would be no way to provide runtime access to this
value in previous versions of Python.

Plain String in Annotated

In the discussion, it was also suggested to use a plain string inside of
~typing.Annotated:

    from typing import Annotated

    class User:
        name: Annotated[str, "The user's name"]
        age: Annotated[int, "The user's age"]

        ...

But this would create a predefined meaning for any plain string inside
of ~typing.Annotated, and any tool that was using plain strings in them
for any other purpose, which is currently allowed, would now be invalid.

Having an explicit typing.Doc makes it compatible with current valid
uses of ~typing.Annotated.

Another Annotated-Like Type

In the discussion it was suggested to define a new type similar to
~typing.Annotated, it would take the type and a parameter with the
documentation string:

    from typing import Doc

    class User:
        name: Doc[str, "The user's name"]
        age: Doc[int, "The user's age"]

        ...

This idea was rejected as it would only support that use case and would
make it more difficult to combine it with ~typing.Annotated for other
purposes ( e.g. with FastAPI metadata, Pydantic fields, etc.) or adding
additional metadata apart from the documentation string (e.g.
deprecation).

Transferring Documentation from Type aliases

A previous version of this proposal specified that when type aliases
declared with ~typing.Annotated were used, and these type aliases were
used in annotations, the documentation string would be transferred to
the annotated symbol.

For example:

    from typing import Annotated, Doc, TypeAlias


    UserName: TypeAlias = Annotated[str, Doc("The user's name")]


    def create_user(name: UserName): ...

    def delete_user(name: UserName): ...

This was rejected after receiving feedback from the maintainer of one of
the main components used to provide editor support.

Shorthand with Slices

In the discussion, it was suggested to use a shorthand with slices:

    is_approved: Annotated[str: "The status of a PEP."]

Although this is a very clever idea and would remove the need for a new
Doc class, runtime executing of current versions of Python don't allow
it.

At runtime, ~typing.Annotated requires at least two arguments, and it
requires the first argument to be type, it crashes if it is a slice.

Open Issues

Verbosity

The main argument against this would be the increased verbosity.

If the signature was not viewed independently of the documentation and
the body of the function with the docstring was also measured, the total
verbosity would be somewhat similar, as what this proposal does is to
move some of the contents from the docstring in the body to the
signature.

Considering the signature alone, without the body, they could be much
longer than they currently are, they could end up being more than one
page long. In exchange, the equivalent docstrings that currently are
more than one page long would be much shorter.

When comparing the total verbosity, including the signature and the
docstring, the main additional verbosity added by this would be from
using ~typing.Annotated and typing.Doc. If ~typing.Annotated had more
usage, it could make sense to have an improved shorter syntax for it and
for the type of metadata it would carry. But that would only make sense
once ~typing.Annotated is more widely used.

On the other hand, this verbosity would not affect end users as they
would not see the internal code using typing.Doc. The majority of users
would interact with libraries through editors without looking at the
internals, and if anything, they would have tooltips from editors
supporting this proposal.

The cost of dealing with the additional verbosity would mainly be
carried by those library maintainers that use this feature.

This argument could be analogous to the argument against type
annotations in general, as they do indeed increase verbosity, in
exchange for their features. But again, as with type annotations, this
would be optional and only to be used by those that are willing to take
the extra verbosity in exchange for the benefits.

Of course, more advanced users might want to look at the source code of
the libraries and if the authors of those libraries adopted this, those
advanced users would end up having to look at that code with additional
signature verbosity instead of docstring verbosity.

Any authors that decide not to adopt it should be free to continue using
docstrings with any particular format they decide, no docstrings at all,
etc.

Still, there's a high chance that library authors could receive pressure
to adopt this if it became the blessed solution.

Documentation is not Typing

It could also be argued that documentation is not really part of typing,
or that it should live in a different module. Or that this information
should not be part of the signature but live in another place (like the
docstring).

Nevertheless, type annotations in Python could already be considered, by
default, additional metadata: they carry additional information about
variables, parameters, return types, and by default they don't have any
runtime behavior. And this proposal would add one more type of metadata
to them.

It could be argued that this proposal extends the type of information
that type annotations carry, the same way as PEP 702 extends them to
include deprecation information.

~typing.Annotated was added to the standard library precisely to support
adding additional metadata to the annotations, and as the new proposed
Doc class is tightly coupled to ~typing.Annotated, it makes sense for it
to live in the same module. If ~typing.Annotated was moved to another
module, it would make sense to move Doc with it.

Multiple Standards

Another argument against this would be that it would create another
standard, and that there are already several conventions for docstrings.
It could seem better to formalize one of the currently existing
standards.

Nevertheless, as stated above, none of those conventions cover the
general drawbacks of a doctsring-based approach that this proposal
solves naturally.

To see a list of the drawbacks of a docstring-based approach, see the
section above in the Rationale - Summary.

In the same way, it can be seen that, in many cases, a new standard that
takes advantage of new features and solves several problems from
previous methods can be worth having. As is the case with the new
pyproject.toml, dataclass_transform, the new typing pipe/union (|)
operator, and other cases.

Adoption

As this is a new standard proposal, it would only make sense if it had
interest from the community.

Fortunately there's already interest from several mainstream libraries
from several developers and teams, including FastAPI, Typer, SQLModel,
Asyncer (from the author of this proposal), Pydantic, Strawberry
(GraphQL), and others.

There's also interest and support from documentation tools, like
mkdocstrings, which added support even for an earlier version of this
proposal.

All the CPython core developers contacted for early feedback (at least
4) have shown interest and support for this proposal.

Editor developers (VS Code and PyCharm) have shown some interest, while
showing concerns about the signature verbosity of the proposal, although
not about the implementation (which is what would affect them the most).
And they have shown they would consider adding support for this if it
were to become an official standard. In that case, they would only need
to add support for rendering, as support for editing, which is normally
non-existing for other standards, is already there, as they already
support editing standard Python syntax.

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.