PEP: 292 Title: Simpler String Substitutions Author: Barry Warsaw
<barry@python.org> Status: Final Type: Standards Track Content-Type:
text/x-rst Created: 18-Jun-2002 Python-Version: 2.4 Post-History:
18-Jun-2002, 23-Mar-2004, 22-Aug-2004 Replaces: 215

Abstract

This PEP describes a simpler string substitution feature, also known as
string interpolation. This PEP is "simpler" in two respects:

1.  Python's current string substitution feature (i.e. %-substitution)
    is complicated and error prone. This PEP is simpler at the cost of
    some expressiveness.
2.  PEP 215 proposed an alternative string interpolation feature,
    introducing a new $ string prefix. PEP 292 is simpler than this
    because it involves no syntax changes and has much simpler rules for
    what substitutions can occur in the string.

Rationale

Python currently supports a string substitution syntax based on C's
printf() '%' formatting character[1]. While quite rich, %-formatting
codes are also error prone, even for experienced Python programmers. A
common mistake is to leave off the trailing format character, e.g. the
's' in "%(name)s".

In addition, the rules for what can follow a % sign are fairly complex,
while the usual application rarely needs such complexity. Most scripts
need to do some string interpolation, but most of those use simple
'stringification' formats, i.e. %s or %(name)s This form should be made
simpler and less error prone.

A Simpler Proposal

We propose the addition of a new class, called Template, which will live
in the string module. The Template class supports new rules for string
substitution; its value contains placeholders, introduced with the $
character. The following rules for $-placeholders apply:

1.  $$ is an escape; it is replaced with a single $
2.  $identifier names a substitution placeholder matching a mapping key
    of "identifier". By default, "identifier" must spell a Python
    identifier as defined in[2]. The first non-identifier character
    after the $ character terminates this placeholder specification.
3.  ${identifier} is equivalent to $identifier. It is required when
    valid identifier characters follow the placeholder but are not part
    of the placeholder, e.g. "${noun}ification".

If the $ character appears at the end of the line, or is followed by any
other character than those described above, a ValueError will be raised
at interpolation time. Values in mapping are converted automatically to
strings.

No other characters have special meaning, however it is possible to
derive from the Template class to define different substitution rules.
For example, a derived class could allow for periods in the placeholder
(e.g. to support a kind of dynamic namespace and attribute path lookup),
or could define a delimiter character other than $.

Once the Template has been created, substitutions can be performed by
calling one of two methods:

-   substitute(). This method returns a new string which results when
    the values of a mapping are substituted for the placeholders in the
    Template. If there are placeholders which are not present in the
    mapping, a KeyError will be raised.

-   safe_substitute(). This is similar to the substitute() method,
    except that KeyErrors are never raised (due to placeholders missing
    from the mapping). When a placeholder is missing, the original
    placeholder will appear in the resulting string.

    Here are some examples:

        >>> from string import Template
        >>> s = Template('${name} was born in ${country}')
        >>> print s.substitute(name='Guido', country='the Netherlands')
        Guido was born in the Netherlands
        >>> print s.substitute(name='Guido')
        Traceback (most recent call last):
        [...]
        KeyError: 'country'
        >>> print s.safe_substitute(name='Guido')
        Guido was born in ${country}

The signature of substitute() and safe_substitute() allows for passing
the mapping of placeholders to values, either as a single
dictionary-like object in the first positional argument, or as keyword
arguments as shown above. The exact details and signatures of these two
methods is reserved for the standard library documentation.

Why $ and Braces?

The BDFL said it best[3]: "The $ means "substitution" in so many
languages besides Perl that I wonder where you've been. [...] We're
copying this from the shell."

Thus the substitution rules are chosen because of the similarity with so
many other languages. This makes the substitution rules easier to teach,
learn, and remember.

Comparison to PEP 215

PEP 215 describes an alternate proposal for string interpolation. Unlike
that PEP, this one does not propose any new syntax for Python. All the
proposed new features are embodied in a new library module. PEP 215
proposes a new string prefix representation such as $"" which signal to
Python that a new type of string is present. $-strings would have to
interact with the existing r-prefixes and u-prefixes, essentially
doubling the number of string prefix combinations.

PEP 215 also allows for arbitrary Python expressions inside the
$-strings, so that you could do things like:

    import sys
    print $"sys = $sys, sys = $sys.modules['sys']"

which would return:

    sys = <module 'sys' (built-in)>, sys = <module 'sys' (built-in)>

It's generally accepted that the rules in PEP 215 are safe in the sense
that they introduce no new security issues (see PEP 215, "Security
Issues" for details). However, the rules are still quite complex, and
make it more difficult to see the substitution placeholder in the
original $-string.

The interesting thing is that the Template class defined in this PEP is
designed for inheritance and, with a little extra work, it's possible to
support PEP 215's functionality using existing Python syntax.

For example, one could define subclasses of Template and dict that
allowed for a more complex placeholder syntax and a mapping that
evaluated those placeholders.

Internationalization

The implementation supports internationalization by recording the
original template string in the Template instance's template attribute.
This attribute would serve as the lookup key in an gettext-based
catalog. It is up to the application to turn the resulting string back
into a Template for substitution.

However, the Template class was designed to work more intuitively in an
internationalized application, by supporting the mixing-in of Template
and unicode subclasses. Thus an internationalized application could
create an application-specific subclass, multiply inheriting from
Template and unicode, and using instances of that subclass as the
gettext catalog key. Further, the subclass could alias the special
__mod__() method to either .substitute() or .safe_substitute() to
provide a more traditional string/unicode like %-operator substitution
syntax.

Reference Implementation

The implementation[4] has been committed to the Python 2.4 source tree.

References

Copyright

This document has been placed in the public domain.

[1] String Formatting Operations
https://docs.python.org/release/2.6/library/stdtypes.html#string-formatting-operations

[2] Identifiers and Keywords
https://docs.python.org/release/2.6/reference/lexical_analysis.html#identifiers-and-keywords

[3] https://mail.python.org/pipermail/python-dev/2002-June/025652.html

[4] Reference Implementation
http://sourceforge.net/tracker/index.php?func=detail&aid=1014055&group_id=5470&atid=305470