PEP: 224 Title: Attribute Docstrings Author: Marc-André Lemburg
<mal@lemburg.com> Status: Rejected Type: Standards Track Created:
23-Aug-2000 Python-Version: 2.1 Post-History:

See 224-rejection for more information.

Introduction

This PEP describes the "attribute docstring" proposal for Python 2.0.
This PEP tracks the status and ownership of this feature. It contains a
description of the feature and outlines changes necessary to support the
feature. The CVS revision history of this file contains the definitive
historical record.

Rationale

This PEP proposes a small addition to the way Python currently handles
docstrings embedded in Python code.

Python currently only handles the case of docstrings which appear
directly after a class definition, a function definition or as first
string literal in a module. The string literals are added to the objects
in question under the __doc__ attribute and are from then on available
for introspection tools which can extract the contained information for
help, debugging and documentation purposes.

Docstrings appearing in locations other than the ones mentioned are
simply ignored and don't result in any code generation.

Here is an example:

    class C:
        "class C doc-string"

        a = 1
        "attribute C.a doc-string (1)"

        b = 2
        "attribute C.b doc-string (2)"

The docstrings (1) and (2) are currently being ignored by the Python
byte code compiler, but could obviously be put to good use for
documenting the named assignments that precede them.

This PEP proposes to also make use of these cases by proposing semantics
for adding their content to the objects in which they appear under new
generated attribute names.

The original idea behind this approach which also inspired the above
example was to enable inline documentation of class attributes, which
can currently only be documented in the class's docstring or using
comments which are not available for introspection.

Implementation

Docstrings are handled by the byte code compiler as expressions. The
current implementation special cases the few locations mentioned above
to make use of these expressions, but otherwise ignores the strings
completely.

To enable use of these docstrings for documenting named assignments
(which is the natural way of defining e.g. class attributes), the
compiler will have to keep track of the last assigned name and then use
this name to assign the content of the docstring to an attribute of the
containing object by means of storing it in as a constant which is then
added to the object's namespace during object construction time.

In order to preserve features like inheritance and hiding of Python's
special attributes (ones with leading and trailing double underscores),
a special name mangling has to be applied which uniquely identifies the
docstring as belonging to the name assignment and allows finding the
docstring later on by inspecting the namespace.

The following name mangling scheme achieves all of the above:

    __doc_<attributename>__

To keep track of the last assigned name, the byte code compiler stores
this name in a variable of the compiling structure. This variable
defaults to NULL. When it sees a docstring, it then checks the variable
and uses the name as basis for the above name mangling to produce an
implicit assignment of the docstring to the mangled name. It then resets
the variable to NULL to avoid duplicate assignments.

If the variable does not point to a name (i.e. is NULL), no assignments
are made. These will continue to be ignored like before. All classical
docstrings fall under this case, so no duplicate assignments are done.

In the above example this would result in the following new class
attributes to be created:

    C.__doc_a__ == "attribute C.a doc-string (1)"
    C.__doc_b__ == "attribute C.b doc-string (2)"

A patch to the current CVS version of Python 2.0 which implements the
above is available on SourceForge at[1].

Caveats of the Implementation

Since the implementation does not reset the compiling structure variable
when processing a non-expression, e.g. a function definition, the last
assigned name remains active until either the next assignment or the
next occurrence of a docstring.

This can lead to cases where the docstring and assignment may be
separated by other expressions:

    class C:
        "C doc string"

        b = 2

        def x(self):
            "C.x doc string"
             y = 3
             return 1

        "b's doc string"

Since the definition of method "x" currently does not reset the used
assignment name variable, it is still valid when the compiler reaches
the docstring "b's doc string" and thus assigns the string to __doc_b__.

A possible solution to this problem would be resetting the name variable
for all non-expression nodes in the compiler.

Possible Problems

Even though highly unlikely, attribute docstrings could get accidentally
concatenated to the attribute's value:

    class C:
        x = "text" \
            "x's docstring"

The trailing slash would cause the Python compiler to concatenate the
attribute value and the docstring.

A modern syntax highlighting editor would easily make this accident
visible, though, and by simply inserting empty lines between the
attribute definition and the docstring you can avoid the possible
concatenation completely, so the problem is negligible.

Another possible problem is that of using triple quoted strings as a way
to uncomment parts of your code.

If there happens to be an assignment just before the start of the
comment string, then the compiler will treat the comment as docstring
attribute and apply the above logic to it.

Besides generating a docstring for an otherwise undocumented attribute
there is no breakage.

Comments from our BDFL

Early comments on the PEP from Guido:

  I "kinda" like the idea of having attribute docstrings (meaning it's
  not of great importance to me) but there are two things I don't like
  in your current proposal:

  1.  The syntax you propose is too ambiguous: as you say, stand-alone
      string literal are used for other purposes and could suddenly
      become attribute docstrings.
  2.  I don't like the access method either (__doc_<attrname>__).

The author's reply:

  > 1. The syntax you propose is too ambiguous: as you say, stand-alone
      >    string literal are used for other purposes and could suddenly
      >    become attribute docstrings.

  This can be fixed by introducing some extra checks in the compiler to
  reset the "doc attribute" flag in the compiler struct.

      > 2. I don't like the access method either (``__doc_<attrname>__``).

  Any other name will do. It will only have to match these criteria:

  -   must start with two underscores (to match __doc__)
  -   must be extractable using some form of inspection (e.g. by using a
      naming convention which includes some fixed name part)
  -   must be compatible with class inheritance (i.e. should be stored
      as attribute)

Later on in March, Guido pronounced on this PEP in March 2001 (on
python-dev). Here are his reasons for rejection mentioned in private
mail to the author of this PEP:

  ...

  It might be useful, but I really hate the proposed syntax.

      a = 1
      "foo bar"
      b = 1

  I really have no way to know whether "foo bar" is a docstring for a or
  for b.

  ...

  You can use this convention:

      a = 1
      __doc_a__ = "doc string for a"

  This makes it available at runtime.

      > Are you completely opposed to adding attribute documentation
      > to Python or is it just the way the implementation works ? I
      > find the syntax proposed in the PEP very intuitive and many
      > other users on c.l.p and in private emails have supported it
      > at the time I wrote the PEP.

  It's not the implementation, it's the syntax. It doesn't convey a
  clear enough coupling between the variable and the doc string.

Copyright

This document has been placed in the Public Domain.

References

[1] http://sourceforge.net/patch/?func=detailpatch&patch_id=101264&group_id=5470