Following system colour scheme Selected dark colour scheme Selected light colour scheme

Python Enhancement Proposals

PEP 727 – Documentation Metadata in Typing

Author:
Sebastián Ramírez <tiangolo at gmail.com>
Sponsor:
Jelle Zijlstra <jelle.zijlstra at gmail.com>
Discussions-To:
Discourse thread
Status:
Draft
Type:
Standards Track
Topic:
Typing
Created:
28-Aug-2023
Python-Version:
3.13
Post-History:
30-Aug-2023

Table of Contents

Abstract

This document proposes a way to complement docstrings to add additional documentation to Python symbols using type annotations with Annotated (in class attributes, function and method parameters, return values, and variables).

Motivation

The current standard method of documenting code APIs in Python is using docstrings. But there’s no standard way to document parameters in docstrings.

There are several pseudo-standards for the format in these docstrings, and new pseudo-standards can appear easily: numpy, Google, Keras, reST, etc.

All these formats are some specific syntax inside a string. Because of this, when editing those docstrings, editors can’t easily provide support for autocompletion, inline errors for broken syntax, etc.

Editors don’t have a way to support all the possible micro languages in the docstrings and show nice user interfaces when developers use libraries with those different formats. They could even add support for some of the syntaxes, but probably not all, or not completely.

Because the docstring is in a different place in the code than the actual parameters and it requires duplication of information (the parameter name) the information about a parameter is easily in a place in the code quite far away from the declaration of the actual parameter. This means it’s easy to refactor a function, remove a parameter, and forget to remove its docs. The same happens when adding a new parameter: it’s easy to forget to add the docstring for it.

And because of this same duplication of information (the parameter name) editors and other tools would need to have complex custom logic to check or ensure the consistency of the parameters in the signature and in their docstring, or they would simply not be able to support that.

Additionally, it would be difficult to robustly parse varying existing docstring conventions to programatically get the documentation for each individual parameter or variable at runtime. This would be useful, for example, for testing the contents of each parameter’s documentation, to ensure consistency across several similar functions, or to extract and expose that same parameter documentation in some other way (e.g. an API, a CLI, etc).

Some of these previous formats tried to account for the lack of type annotations in older Python versions by including typing information in the docstrings, but now that information doesn’t need to be in docstrings as there is now an official syntax for type annotations.

Rationale

This proposal intends to address these shortcomings by extending and complementing the information in docstrings, keeping backwards compatibility with existing docstrings, and doing it in a way that leverages the Python language and structure, via type annotations with Annotated, and a new function in typing.

The reason why this would belong in the standard Python library instead of an external package is because although the implementation would be quite trivial, the actual power and benefit from it would come from being a standard, so that editors and other tools could implement support for it.

This doesn’t deprecate current usage of docstrings, it’s transparent to common developers (library users), and it’s only opt-in for library authors that would like to adopt it.

Specification

The main proposal is to introduce a new function, typing.doc(), to be used when documenting Python objects. This function MUST only be used within Annotated annotations. The function takes a single string argument, documentation, and returns an instance of typing.DocInfo, which stores the input string unchanged.

Any tool processing typing.DocInfo objects SHOULD interpret the string as a docstring, and therefore SHOULD normalize whitespace as if inspect.cleandoc() were used.

The string passed to typing.doc() SHOULD be of the form that would be a valid docstring. This means that f-strings and string operations SHOULD NOT be used. As this cannot be enforced by the Python runtime, tools SHOULD NOT rely on this behaviour, and SHOULD exit with an error if such a prohibited string is encountered.

Examples

Class attributes may be documented:

from typing import Annotated, doc

class User:
    first_name: Annotated[str, doc("The user's first name")]
    last_name: Annotated[str, doc("The user's last name")]

    ...

As can function or method parameters:

from typing import Annotated, doc

def create_user(
    first_name: Annotated[str, doc("The user's first name")],
    last_name: Annotated[str, doc("The user's last name")],
    cursor: DatabaseConnection | None = None,
) -> Annotated[User, doc("The created user after saving in the database")]:
    """Create a new user in the system.

    It needs the database connection to be already initialized.
    """
    pass

Additional Scenarios

The main scenarios that this proposal intends to cover are described above, and for implementers to be conformant to this specification, they only need to support those scenarios described above.

Here are some additional edge case scenarios with their respective considerations, but implementers are not required to support them.

Type Alias

When creating a type alias, like:

Username = Annotated[str, doc("The name of a user in the system")]

The documentation would be considered to be carried by the parameter annotated with Username.

So, in a function like:

def hi(to: Username) -> None: ...

It would be equivalent to:

def hi(to: Annotated[str, doc("The name of a user in the system")]) -> None: ...

Nevertheless, implementers would not be required to support type aliases outside of the final type annotation to be conformant with this specification, as it could require more complex dereferencing logic.

Annotating Type Parameters

When annotating type parameters, as in:

def hi(
    to: list[Annotated[str, doc("The name of a user in a list")]],
) -> None: ...

The documentation in doc() would refer to what it is annotating, in this case, each item in the list, not the list itself.

There are currently no practical use cases for documenting type parameters, so implementers are not required to support this scenario to be considered conformant, but it’s included for completeness.

Annotating Unions

If used in one of the parameters of a union, as in:

def hi(
    to: str | Annotated[list[str], doc("List of user names")],
) -> None: ...

Again, the documentation in doc() would refer to what it is annotating, in this case, this documents the list itself, not its items.

In particular, the documentation would not refer to a single string passed as a parameter, only to a list.

There are currently no practical use cases for documenting unions, so implementers are not required to support this scenario to be considered conformant, but it’s included for completeness.

Nested Annotated

Continuing with the same idea above, if Annotated was used nested and used multiple times in the same parameter, doc() would refer to the type it is annotating.

So, in an example like:

def hi(
    to: Annotated[
        Annotated[str, doc("A user name")]
        | Annotated[list, doc("A list of user names")],
        doc("Who to say hi to"),
    ],
) -> None: ...

The documentation for the whole parameter to would be considered to be “Who to say hi to”.

The documentation for the case where that parameter to is specifically a str would be considered to be “A user name”.

The documentation for the case where that parameter to is specifically a list would be considered to be “A list of user names”.

Implementers would only be required to support the top level use case, where the documentation for to is considered to be “Who to say hi to”. They could optionally support having conditional documentation for when the type of the parameter passed is of one type or another, but they are not required to do so.

Duplication

If doc() is used multiple times in a single Annotated, it would be considered invalid usage from the developer, for example:

def hi(
    to: Annotated[str, doc("A user name"), doc("The current user name")],
) -> None: ...

Implementers can consider this invalid and are not required to support this to be considered conformant.

Nevertheless, as it might be difficult to enforce it on developers, implementers can opt to support one of the doc() declarations.

In that case, the suggestion would be to support the last one, just because this would support overriding, for example, in:

User = Annotated[str, doc("A user name")]

CurrentUser = Annotated[User, doc("The current user name")]

Internally, in Python, CurrentUser here is equivalent to:

CurrentUser = Annotated[str,
                        doc("A user name"),
                        doc("The current user name")]

For an implementation that supports the last doc() appearance, the above example would be equivalent to:

def hi(to: Annotated[str, doc("The current user name")]) -> None: ...

Reference Implementation

typing.doc and typing.DocInfo are implemented as follows:

def doc(documentation: str, /) -> DocInfo:
    return DocInfo(documentation)

class DocInfo:
    def __init__(self, documentation: str, /):
        self.documentation = documentation

These have been implemented in the typing_extensions package.

Rejected Ideas

Standardize Current Docstrings

A possible alternative would be to support and try to push as a standard one of the existing docstring formats. But that would only solve the standardization.

It wouldn’t solve any of the other problems, like getting editor support (syntax checks) for library authors, the distance and duplication of information between a parameter definition and its documentation in the docstring, etc.

Extra Metadata and Decorator

An earlier version of this proposal included several parameters to indicate whether an object is discouraged from use, what exceptions it may raise, etc. To allow also deprecating functions and classes, it was also expected that doc() could be used as a decorator. But this functionality is covered by typing.deprecated() in PEP 702, so it was dropped from this proposal.

A way to declare additional information could still be useful in the future, but taking early feedback on this document, all that was postponed to future proposals.

This also shifts the focus from an all-encompassing function doc() with multiple parameters to multiple composable functions, having doc() handle one single use case: additional documentation in Annotated.

This design change also allows better interoperability with other proposals like typing.deprecated(), as in the future it could be considered to allow having typing.deprecated() also in Annotated to deprecate individual parameters, coexisting with doc().

Open Issues

Verbosity

The main argument against this would be the increased verbosity.

Nevertheless, this verbosity would not affect end users as they would not see the internal code using typing.doc().

And the cost of dealing with the additional verbosity would only be carried by those library maintainers that decide to opt-in into this feature.

Any authors that decide not to adopt it, are free to continue using docstrings with any particular format they decide, no docstrings at all, etc.

This argument could be analogous to the argument against type annotations in general, as they do indeed increase verbosity, in exchange for their features. But again, as with type annotations, this would be optional and only to be used by those that are willing to take the extra verbosity in exchange for the benefits.

Documentation is not Typing

It could also be argued that documentation is not really part of typing, or that it should live in a different module. Or that this information should not be part of the signature but live in another place (like the docstring).

Nevertheless, type annotations in Python could already be considered, by default, mainly documentation: they carry additional information about variables, parameters, return types, and by default they don’t have any runtime behavior.

It could be argued that this proposal extends the type of information that type annotations carry, the same way as PEP 702 extends them to include deprecation information.

And as described above, including this in typing_extensions to support older versions of Python would have a very simple and practical benefit.

Multiple Standards

Another argument against this would be that it would create another standard, and that there are already several pseudo-standards for docstrings. It could seem better to formalize one of the currently existing standards.

Nevertheless, as stated above, none of those standards cover the general drawbacks of a doctsring-based approach that this proposal solves naturally.

None of the editors have full docstring editing support (even when they have rendering support). Again, this is solved by this proposal just by using standard Python syntax and structures instead of a docstring microsyntax.

The effort required to implement support for this proposal by tools would be minimal compared to that required for alternative docstring-based pseudo-standards, as for this proposal, editors would only need to access an already existing value in their ASTs, instead of writing a parser for a new string microsyntax.

In the same way, it can be seen that, in many cases, a new standard that takes advantage of new features and solves several problems from previous methods can be worth having. As is the case with the new pyproject.toml, dataclass_transform, the new typing pipe/union (|) operator, and other cases.

Adoption

As this is a new standard proposal, it would only make sense if it had interest from the community.

Fortunately there’s already interest from several mainstream libraries from several developers and teams, including FastAPI, Typer, SQLModel, Asyncer (from the author of this proposal), Pydantic, Strawberry, and others, from other teams.

There’s also interest and support from documentation tools, like mkdocstrings, which added support even for an earlier version of this proposal.

All the CPython core developers contacted for early feedback (at least 4) have shown interest and support for this proposal.

Editor developers (VS Code and PyCharm) have shown some interest, while showing concerns about the verbosity of the proposal, although not about the implementation (which is what would affect them the most). And they have shown they would consider adding support for this if it were to become an official standard. In that case, they would only need to add support for rendering, as support for editing, which is normally non-existing for other standards, is already there, as they already support editing standard Python syntax.

Bike Shedding

I think doc() is a good name for the main function. But it might make sense to consider changing the names for the other parts.

The returned class containing info currently named DocInfo could instead be named just Doc. Although it could make verbal conversations more confusing as it’s the same word as the name of the function.

The parameter received by doc() currently named documentation could instead be named also doc, but it would make it more ambiguous in discussions to distinguish when talking about the function and the parameter, although it would simplify the amount of terms, but as these terms refer to different things closely related, it could make sense to have different names.

The parameter received by doc() currently named documentation could instead be named value, but the word “documentation” might convey the meaning better.

The parameter received by doc() currently named documentation could be a position-only parameter, in which case the name wouldn’t matter much. But then there wouldn’t be a way to make it match with the DocInfo attribute.

The DocInfo class has a single attribute documentation, this name matches the parameter passed to doc(). It could be named something different, like doc, but this would mean a mismatch between the doc() parameter documentation and the equivalent attribute doc, and it would mean that in one case (in the function), the term doc refers to a function, and in the other case (the resulting class) the term doc refers to a string value.

This shows the logic to select the current terms, but it could all be discussed further.


Source: https://github.com/python/peps/blob/main/peps/pep-0727.rst

Last modified: 2023-09-09 17:39:29 GMT