PEP: 350 Title: Codetags Version: $Revision$ Last-Modified: $Date$
Author: Micah Elliott <mde at tracos.org> Status: Rejected Type:
Informational Content-Type: text/x-rst Created: 27-Jun-2005
Post-History: 10-Aug-2005, 26-Sep-2005

Rejection Notice

This PEP has been rejected. While the community may be interested, there
is no desire to make the standard library conform to this standard.

Abstract

This informational PEP aims to provide guidelines for consistent use of
codetags, which would enable the construction of standard utilities to
take advantage of the codetag information, as well as making Python code
more uniform across projects. Codetags also represent a very lightweight
programming micro-paradigm and become useful for project management,
documentation, change tracking, and project health monitoring. This is
submitted as a PEP because its ideas are thought to be Pythonic,
although the concepts are not unique to Python programming. Herein are
the definition of codetags, the philosophy behind them, a motivation for
standardized conventions, some examples, a specification, a toolset
description, and possible objections to the Codetag project/paradigm.

This PEP is also living as a wiki for people to add comments.

What Are Codetags?

Programmers widely use ad-hoc code comment markup conventions to serve
as reminders of sections of code that need closer inspection or review.
Examples of markup include FIXME, TODO, XXX, BUG, but there many more in
wide use in existing software. Such markup will henceforth be referred
to as codetags. These codetags may show up in application code, unit
tests, scripts, general documentation, or wherever suitable.

Codetags have been under discussion and in use (hundreds of codetags in
the Python 2.4 sources) in many places (e.g., c2) for many years. See
References for further historic and current information.

Philosophy

If you subscribe to most of these values, then codetags will likely be
useful for you.

1.  As much information as possible should be contained inside the
    source code (application code or unit tests). This along with use of
    codetags impedes duplication. Most documentation can be generated
    from that source code; e.g., by using help2man, man2html, docutils,
    epydoc/pydoc, ctdoc, etc.
2.  Information should be almost never duplicated -- it should be
    recorded in a single original format and all other locations should
    be automatically generated from the original, or simply be
    referenced. This is famously known as the Single Point Of Truth
    (SPOT) or Don't Repeat Yourself (DRY) rule.
3.  Documentation that gets into customers' hands should be
    auto-generated from single sources into all other output formats.
    People want documentation in many forms. It is thus important to
    have a documentation system that can generate all of these.
4.  The developers are the documentation team. They write the code and
    should know the code the best. There should not be a dedicated,
    disjoint documentation team for any non-huge project.
5.  Plain text (with non-invasive markup) is the best format for writing
    anything. All other formats are to be generated from the plain text.

Codetag design was influenced by the following goals:

A.  Comments should be short whenever possible.
B.  Codetag fields should be optional and of minimal length. Default
    values and custom fields can be set by individual code shops.
C.  Codetags should be minimalistic. The quicker it is to jot something
    down, the more likely it is to get jotted.
D.  The most common use of codetags will only have zero to two fields
    specified, and these should be the easiest to type and read.

Motivation

-   Various productivity tools can be built around codetags.

    See Tools.

-   Encourages consistency.

    Historically, a subset of these codetags has been used informally in
    the majority of source code in existence, whether in Python or in
    other languages. Tags have been used in an inconsistent manner with
    different spellings, semantics, format, and placement. For example,
    some programmers might include datestamps and/or user identifiers,
    limit to a single line or not, spell the codetag differently than
    others, etc.

-   Encourages adherence to SPOT/DRY principle.

    E.g., generating a roadmap dynamically from codetags instead of
    keeping TODOs in sync with separate roadmap document.

-   Easy to remember.

    All codetags must be concise, intuitive, and semantically
    non-overlapping with others. The format must also be simple.

-   Use not required/imposed.

    If you don't use codetags already, there's no obligation to start,
    and no risk of affecting code (but see Objections). A small subset
    can be adopted and the Tools will still be useful (a few codetags
    have probably already been adopted on an ad-hoc basis anyway). Also
    it is very easy to identify and remove (and possibly record) a
    codetag that is no longer deemed useful.

-   Gives a global view of code.

    Tools can be used to generate documentation and reports.

-   A logical location for capturing CRCs/Stories/Requirements.

    The XP community often does not electronically capture Stories, but
    codetags seem like a good place to locate them.

-   Extremely lightweight process.

    Creating tickets in a tracking system for every thought degrades
    development velocity. Even if a ticketing system is employed,
    codetags are useful for simply containing links to those tickets.

Examples

This shows a simple codetag as commonly found in sources everywhere
(with the addition of a trailing <>):

    # FIXME: Seems like this loop should be finite. <>
    while True: ...

The following contrived example demonstrates a typical use of codetags.
It uses some of the available fields to specify the assignees (a pair of
programmers with initials MDE and CLE), the Date of expected completion
(Week 14), and the Priority of the item (2):

    # FIXME: Seems like this loop should be finite. <MDE,CLE d:14w p:2>
    while True: ...

This codetag shows a bug with fields describing author, discovery
(origination) date, due date, and priority:

    # BUG: Crashes if run on Sundays.
    # <MDE 2005-09-04 d:14w p:2>
    if day == 'Sunday': ...

Here is a demonstration of how not to use codetags. This has many
problems: 1) Codetags cannot share a line with code; 2) Missing colon
after mnemonic; 3) A codetag referring to codetags is usually useless,
and worse, it is not completable; 4) No need to have a bunch of fields
for a trivial codetag; 5) Fields with unknown values (t:XXX) should not
be used:

    i = i + 1   # TODO Add some more codetags.
    # <JRNewbie 2005-04-03 d:2005-09-03 t:XXX d:14w p:0 s:inprogress>

Specification

This describes the format: syntax, mnemonic names, fields, and
semantics, and also the separate DONE File.

General Syntax

Each codetag should be inside a comment, and can be any number of lines.
It should not share a line with code. It should match the indentation of
surrounding code. The end of the codetag is marked by a pair of angle
brackets <> containing optional fields, which must not be split onto
multiple lines. It is preferred to have a codetag in # comments instead
of string comments. There can be multiple fields per codetag, all of
which are optional.

In short, a codetag consists of a mnemonic, a colon, commentary text, an
opening angle bracket, an optional list of fields, and a closing angle
bracket. E.g., :

    # MNEMONIC: Some (maybe multi-line) commentary. <field field ...>

Mnemonics

The codetags of interest are listed below, using the following format:

recommended mnemonic (& synonym list)
    *canonical name*: semantics

TODO (MILESTONE, MLSTN, DONE, YAGNI, TBD, TOBEDONE)

    To do: Informal tasks/features that are pending completion.

FIXME (XXX, DEBUG, BROKEN, REFACTOR, REFACT, RFCTR, OOPS, SMELL, NEEDSWORK, INSPECT)

    Fix me: Areas of problematic or ugly code needing refactoring or
    cleanup.

BUG (BUGFIX)

    Bugs: Reported defects tracked in bug database.

NOBUG (NOFIX, WONTFIX, DONTFIX, NEVERFIX, UNFIXABLE, CANTFIX)

    Will Not Be Fixed: Problems that are well-known but will never be
    addressed due to design problems or domain limitations.

REQ (REQUIREMENT, STORY)

    Requirements: Satisfactions of specific, formal requirements.

RFE (FEETCH, NYI, FR, FTRQ, FTR)

    Requests For Enhancement: Roadmap items not yet implemented.

IDEA

    Ideas: Possible RFE candidates, but less formal than RFE.

??? (QUESTION, QUEST, QSTN, WTF)

    Questions: Misunderstood details.

!!! (ALERT)

    Alerts: In need of immediate attention.

HACK (CLEVER, MAGIC)

    Hacks: Temporary code to force inflexible functionality, or simply a
    test change, or workaround a known problem.

PORT (PORTABILITY, WKRD)

    Portability: Workarounds specific to OS, Python version, etc.

CAVEAT (CAV, CAVT, WARNING, CAUTION)

    Caveats: Implementation details/gotchas that stand out as
    non-intuitive.

NOTE (HELP)

    Notes: Sections where a code reviewer found something that needs
    discussion or further investigation.

FAQ

    Frequently Asked Questions: Interesting areas that require external
    explanation.

GLOSS (GLOSSARY)

    Glossary: Definitions for project glossary.

SEE (REF, REFERENCE)

    See: Pointers to other code, web link, etc.

TODOC (DOCDO, DODOC, NEEDSDOC, EXPLAIN, DOCUMENT)

    Needs Documentation: Areas of code that still need to be documented.

CRED (CREDIT, THANKS)

    Credits: Accreditations for external provision of enlightenment.

STAT (STATUS)

    Status: File-level statistical indicator of maturity of this file.

RVD (REVIEWED, REVIEW)

    Reviewed: File-level indicator that review was conducted.

File-level codetags might be better suited as properties in the revision
control system, but might still be appropriately specified in a codetag.

Some of these are temporary (e.g., FIXME) while others are persistent
(e.g., REQ). A mnemonic was chosen over a synonym using three criteria:
descriptiveness, length (shorter is better), commonly used.

Choosing between FIXME and XXX is difficult. XXX seems to be more
common, but much less descriptive. Furthermore, XXX is a useful
placeholder in a piece of code having a value that is unknown. Thus
FIXME is the preferred spelling. Sun says that XXX and FIXME are
slightly different, giving XXX higher severity. However, with decades of
chaos on this topic, and too many millions of developers who won't be
influenced by Sun, it is easy to rightly call them synonyms.

DONE is always a completed TODO item, but this should probably be
indicated through the revision control system and/or a completion
recording mechanism (see DONE File).

It may be a useful metric to count NOTE tags: a high count may indicate
a design (or other) problem. But of course the majority of codetags
indicate areas of code needing some attention.

An FAQ is probably more appropriately documented in a wiki where users
can more easily view and contribute.

Fields

All fields are optional. The proposed standard fields are described in
this section. Note that upper case field characters are intended to be
replaced.

The Originator/Assignee and Origination Date/Week fields are the most
common and don't usually require a prefix.

This lengthy list of fields is liable to scare people (the intended
minimalists) away from adopting codetags, but keep in mind that these
only exist to support programmers who either 1) like to keep BUG or RFE
codetags in a complete form, or 2) are using codetags as their complete
and only tracking system. In other words, many of these fields will be
used very rarely. They are gathered largely from industry-wide
conventions, and example sources include GCC Bugzilla and Python's
SourceForge tracking systems.

AAA[,BBB]...

    List of Originator or Assignee initials (the context determines
    which unless both should exist). It is also okay to use usernames
    such as MicahE instead of initials. Initials (in upper case) are the
    preferred form.

a:AAA[,BBB]...

    List of Assignee initials. This is necessary only in (rare) cases
    where a codetag has both an assignee and an originator, and they are
    different. Otherwise the a: prefix is omitted, and context
    determines the intent. E.g., FIXME usually has an Assignee, and NOTE
    usually has an Originator, but if a FIXME was originated (and
    initialed) by a reviewer, then the assignee's initials would need a
    a: prefix.

YYYY[-MM[-DD]] or WW[.D]w

    The Origination Date indicating when the comment was added, in ISO
    8601 format (digits and hyphens only). Or Origination Week, an
    alternative form for specifying an Origination Date. A day of the
    week can be optionally specified. The w suffix is necessary for
    distinguishing from a date.

d:YYYY[-MM[-DD]] or d:WW[.D]w

    Due Date (d) target completion (estimate). Or Due Week (d), an
    alternative to specifying a Due Date.

p:N

    Priority (p) level. Range (N) is from 0..3 with 3 being the highest.
    0..3 are analogous to low, medium, high, and showstopper/critical.
    The Severity field could be factored into this single number, and
    doing so is recommended since having both is subject to varying
    interpretation. The range and order should be customizable. The
    existence of this field is important for any tool that itemizes
    codetags. Thus a (customizable) default value should be supported.

t:NNNN

    Tracker (t) number corresponding to associated Ticket ID in separate
    tracking system.

The following fields are also available but expected to be less common.

c:AAAA

    Category (c) indicating some specific area affected by this item.

s:AAAA

    Status (s) indicating state of item. Examples are "unexplored",
    "understood", "inprogress", "fixed", "done", "closed". Note that
    when an item is completed it is probably better to remove the
    codetag and record it in a DONE File.

i:N

    Development cycle Iteration (i). Useful for grouping codetags into
    completion target groups.

r:N

    Development cycle Release (r). Useful for grouping codetags into
    completion target groups.

To summarize, the non-prefixed fields are initials and origination date,
and the prefixed fields are: assignee (a), due (d), priority (p),
tracker (t), category (c), status (s), iteration (i), and release (r).

It should be possible for groups to define or add their own fields, and
these should have upper case prefixes to distinguish them from the
standard set. Examples of custom fields are Operating System (O),
Severity (S), Affected Version (A), Customer (C), etc.

DONE File

Some codetags have an ability to be completed (e.g., FIXME, TODO, BUG).
It is often important to retain completed items by recording them with a
completion date stamp. Such completed items are best stored in a single
location, global to a project (or maybe a package). The proposed format
is most easily described by an example, say ~/src/fooproj/DONE:

    # TODO: Recurse into subdirs only on blue
    # moons. <MDE 2003-09-26>
    [2005-09-26 Oops, I underestimated this one a bit.  Should have
    used Warsaw's First Law!]

    # FIXME: ...
    ...

You can see that the codetag is copied verbatim from the original source
file. The date stamp is then entered on the following line with an
optional post-mortem commentary. The entry is terminated by a blank line
(\n\n).

It may sound burdensome to have to delete codetag lines every time one
gets completed. But in practice it is quite easy to setup a Vim or Emacs
mapping to auto-record a codetag deletion in this format (sans the
commentary).

Tools

Currently, programmers (and sometimes analysts) typically use grep to
generate a list of items corresponding to a single codetag. However,
various hypothetical productivity tools could take advantage of a
consistent codetag format. Some example tools follow.

Document Generator

    Possible docs: glossary, roadmap, manpages

Codetag History

    Track (with revision control system interface) when a BUG tag (or
    any codetag) originated/resolved in a code section

Code Statistics

    A project Health-O-Meter

Codetag Lint

    Notify of invalid use of codetags, and aid in porting to codetags

Story Manager/Browser

    An electronic means to replace XP notecards. In MVC terms, the
    codetag is the Model, and the Story Manager could be a graphical
    Viewer/Controller to do visual rearrangement, prioritization, and
    assignment, milestone management.

Any Text Editor

    Used for changing, removing, adding, rearranging, recording
    codetags.

There are some tools already in existence that take advantage of a
smaller set of pseudo-codetags (see References). There is also an
example codetags implementation under way, known as the Codetag Project.

Objections

Objection

    Extreme Programming argues that such codetags should not ever exist
    in code since the code is the documentation.

Defense

    Maybe you should put the codetags in the unit test files instead.
    Besides, it's tough to generate documentation from uncommented
    source code.

------------------------------------------------------------------------

Objection

    Too much existing code has not followed proposed guidelines.

Defense

    [Simple] utilities (ctlint) could convert existing codes.

------------------------------------------------------------------------

Objection

    Causes duplication with tracking system.

Defense

    Not really, unless fields are abused. If an item exists in the
    tracker, a simple ticket number in the codetag tracker field is
    sufficient. Maybe a duplicated title would be acceptable.
    Furthermore, it's too burdensome to have a ticket filed for every
    item that pops into a developer's mind on-the-go. Additionally, the
    tracking system could possibly be obviated for simple or small
    projects that can reasonably fit the relevant data into a codetag.

------------------------------------------------------------------------

Objection

    Codetags are ugly and clutter code.

Defense

    That is a good point. But I'd still rather have such info in a
    single place (the source code) than various other documents, likely
    getting duplicated or forgotten about. The completed codetags can be
    sent off to the DONE File, or to the bit bucket.

------------------------------------------------------------------------

Objection

    Codetags (and all comments) get out of date.

Defense

    Not so much if other sources (externally visible documentation)
    depend on their being accurate.

------------------------------------------------------------------------

Objection

    Codetags tend to only rarely have estimated completion dates of any
    sort. OK, the fields are optional, but you want to suggest fields
    that actually will be widely used.

Defense

    If an item is inestimable don't bother with specifying a date field.
    Using tools to display items with order and/or color by due date
    and/or priority, it is easier to make estimates. Having your roadmap
    be a dynamic reflection of your codetags makes you much more likely
    to keep the codetags accurate.

------------------------------------------------------------------------

Objection

    Named variables for the field parameters in the <> should be used
    instead of cryptic one-character prefixes. I.e., <MDE p:3> should
    rather be <author=MDE, priority=3>.

Defense

    It is just too much typing/verbosity to spell out fields. I argue
    that p:3 i:2 is as readable as priority=3, iteration=2 and is much
    more likely to by typed and remembered (see bullet C in Philosophy).
    In this case practicality beats purity. There are not many fields to
    keep track of so one letter prefixes are suitable.

------------------------------------------------------------------------

Objection

    Synonyms should be deprecated since it is better to have a single
    way to spell something.

Defense

    Many programmers prefer short mnemonic names, especially in
    comments. This is why short mnemonics were chosen as the primary
    names. However, others feel that an explicit spelling is less
    confusing and less prone to error. There will always be two camps on
    this subject. Thus synonyms (and complete, full spellings) should
    remain supported.

------------------------------------------------------------------------

Objection

    It is cruel to use [for mnemonics] opaque acronyms and abbreviations
    which drop vowels; it's hard to figure these things out. On that
    basis I hate: MLSTN RFCTR RFE FEETCH, NYI, FR, FTRQ, FTR WKRD RVDBY

Defense

    Mnemonics are preferred since they are pretty easy to remember and
    take up less space. If programmers didn't like dropping vowels we
    would be able to fit very little code on a line. The space is
    important for those who write comments that often fit on a single
    line. But when using a canon everywhere it is much less likely to
    get something to fit on a line.

------------------------------------------------------------------------

Objection

    It takes too long to type the fields.

Defense

    Then don't use (most or any of) them, especially if you're the only
    programmer. Terminating a codetag with <> is a small chore, and in
    doing so you enable the use of the proposed tools. Editor
    auto-completion of codetags is also useful: You can program your
    editor to stamp a template (e.g. # FIXME . <MDE {date}>) with just a
    keystroke or two.

------------------------------------------------------------------------

Objection

    WorkWeek is an obscure and uncommon time unit.

Defense

    That's true but it is a highly suitable unit of granularity for
    estimation/targeting purposes, and it is very compact. The ISO 8601
    is widely understood but allows you to only specify either a
    specific day (restrictive) or month (broad).

------------------------------------------------------------------------

Objection

    I aesthetically dislike for the comment to be terminated with <> in
    the empty field case.

Defense

    It is necessary to have a terminator since codetags may be followed
    by non-codetag comments. Or codetags could be limited to a single
    line, but that's prohibitive. I can't think of any single-character
    terminator that is appropriate and significantly better than <>.
    Maybe @ could be a terminator, but then most codetags will have an
    unnecessary @.

------------------------------------------------------------------------

Objection

    I can't use codetags when writing HTML, or less specifically, XML.
    Maybe @fields@ would be a better than <fields> as the delimiters.

Defense

    Maybe you're right, but <> looks nicer whenever applicable. XML/SGML
    could use @ while more common programming languages stick to <>.

References

Some other tools have approached defining/exploiting codetags. See
http://tracos.org/codetag/wiki/Links.