PEP: 501 Title: General purpose template literal strings Author: Alyssa
Coghlan <ncoghlan@gmail.com>, Nick Humrich <nick@humrich.us>
Discussions-To:
https://discuss.python.org/t/pep-501-reopen-general-purpose-string-template-literals/24625
Status: Draft Type: Standards Track Content-Type: text/x-rst Requires:
701 Created: 08-Aug-2015 Python-Version: 3.12 Post-History: 08-Aug-2015,
05-Sep-2015, 09-Mar-2023,

Abstract

Though easy and elegant to use, Python f-strings can be vulnerable to
injection attacks when used to construct shell commands, SQL queries,
HTML snippets and similar (for example,
os.system(f"echo {message_from_user}")). This PEP introduces template
literal strings (or "t-strings"), which have syntax and semantics that
are similar to f-strings, but with rendering deferred until format or
another template rendering function is called on them. This will allow
standard library calls, helper functions and third party tools to safety
and intelligently perform appropriate escaping and other string
processing on inputs while retaining the usability and convenience of
f-strings.

Relationship with other PEPs

This PEP is inpired by and builds on top of the f-string syntax first
implemented in PEP 498 and formalised in PEP 701.

This PEP complements the literal string typing support added to Python's
formal type system in PEP 675 by introducing a safe way to do dynamic
interpolation of runtime values into security sensitive strings.

This PEP competes with some aspects of the tagged string proposal in PEP
750 (most notably in whether template rendering is expressed as
render(t"template literal") or as render"template literal"), but also
shares many common features (after PEP 750 was published, this PEP was
updated with several new changes inspired by the tagged strings
proposal).

This PEP does NOT propose an alternative to PEP 292 for user interface
internationalization use cases (but does note the potential for future
syntactic enhancements aimed at that use case that would benefit from
the compiler-supported value interpolation machinery that this PEP and
PEP 750 introduce).

Motivation

PEP 498 added new syntactic support for string interpolation that is
transparent to the compiler, allowing name references from the
interpolation operation full access to containing namespaces (as with
any other expression), rather than being limited to explicit name
references. These are referred to in the PEP (and elsewhere) as
"f-strings" (a mnemonic for "formatted strings").

Since acceptance of PEP 498, f-strings have become well-established and
very popular. f-strings became even more useful and flexible with the
formalised grammar in PEP 701. While f-strings are great, eager
rendering has its limitations. For example, the eagerness of f-strings
has made code like the following unfortunately plausible:

    os.system(f"echo {message_from_user}")

This kind of code is superficially elegant, but poses a significant
problem if the interpolated value message_from_user is in fact provided
by an untrusted user: it's an opening for a form of code injection
attack, where the supplied user data has not been properly escaped
before being passed to the os.system call.

While the LiteralString type annotation introduced in PEP 675 means that
typecheckers are able to report a type error for this kind of unsafe
function usage, those errors don't help make it easier to write code
that uses safer alternatives (such as subprocess.run).

To address that problem (and a number of other concerns), this PEP
proposes the complementary introduction of "t-strings" (a mnemonic for
"template literal strings"), where format(t"Message with {data}") would
produce the same result as f"Message with {data}", but the template
literal instance can instead be passed to other template rendering
functions which process the contents of the template differently.

Proposal

Dedicated template literal syntax

This PEP proposes a new string prefix that declares the string to be a
template literal rather than an ordinary string:

    template = t"Substitute {names:>{field_width}} and {expressions()!r} at runtime"

This would be effectively interpreted as:

    template = TemplateLiteral(
        r"Substitute {names:>{field_width}} and {expressions()} at runtime",
        TemplateLiteralText(r"Substitute "),
        TemplateLiteralField("names", names, f">{field_width}", ""),
        TemplateLiteralText(r" and "),
        TemplateLiteralField("expressions()", expressions(), f"", "r"),
    )

(Note: this is an illustrative example implementation. The exact compile
time construction syntax of types.TemplateLiteral is considered an
implementation detail not specified by the PEP. In particular, the
compiler may bypass the default constructor's runtime logic that detects
consecutive text segments and merges them into a single text segment, as
well as checking the runtime types of all supplied arguments).

The __format__ method on types.TemplateLiteral would then implement the
following str.format inspired semantics:

    >>> import datetime
    >>> name = 'Jane'
    >>> age = 50
    >>> anniversary = datetime.date(1991, 10, 12)
    >>> format(t'My name is {name}, my age next year is {age+1}, my anniversary is {anniversary:%A, %B %d, %Y}.')
    'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
    >>> format(t'She said her name is {name!r}.')
    "She said her name is 'Jane'."

The syntax of template literals would be based on PEP 701, and largely
use the same syntax for the string portion of the template. Aside from
using a different prefix, the one other syntactic change is in the
definition and handling of conversion specifiers, both to allow !() as a
standard conversion specifier to request evaluation of a field at
rendering time, and to allow custom renderers to also define custom
conversion specifiers.

This PEP does not propose to remove or deprecate any of the existing
string formatting mechanisms, as those will remain valuable when
formatting strings that are not present directly in the source code of
the application.

Lazy field evaluation conversion specifier

In addition to the existing support for the a, r, and s conversion
specifiers, str.format, str.format_map, and string.Formatter will be
updated to accept () as a conversion specifier that means "call the
interpolated value".

To support application of the standard conversion specifiers in custom
template rendering functions, a new !operator.convert_field function
will be added.

The signature and behaviour of the format builtin will also be updated
to accept a conversion specifier as a third optional parameter. If a
non-empty conversion specifier is given, the value will be converted
with !operator.convert_field before looking up the __format__ method.

Custom conversion specifiers

To allow additional field-specific directives to be passed to custom
rendering functions in a way that still allows formatting of the
template with the default renderer, the conversion specifier field will
be allowed to contain a second ! character.

!operator.convert_field and format (and hence the default
TemplateLiteral.render template rendering method), will ignore that
character and any subsequent text in the conversion specifier field.

str.format, str.format_map, and string.Formatter will also be updated to
accept (and ignore) custom conversion specifiers.

Template renderer for POSIX shell commands

As both a practical demonstration of the benefits of delayed rendering
support, and as a valuable feature in its own right, a new sh template
renderer will be added to the shlex module. This renderer will produce
strings where all interpolated fields are escaped with shlex.quote.

The subprocess.Popen API (and higher level APIs that depend on it, such
as subprocess.run) will be updated to accept interpolation templates and
handle them in accordance with the new shlex.sh renderer.

Background

This PEP was initially proposed as a competitor to PEP 498. After it
became clear that the eager rendering proposal had sustantially more
immediate support, it then spent several years in a deferred state,
pending further experience with PEP 498's simpler approach of only
supporting eager rendering without the additional complexity of also
supporting deferred rendering.

Since then, f-strings have become very popular and PEP 701 was
introduced to tidy up some rough edges and limitations in their syntax
and semantics. The template literal proposal was updated in 2023 to
reflect current knowledge of f-strings, and improvements from PEP 701.

In 2024, PEP 750 was published, proposing a general purpose mechanism
for custom tagged string prefixes, rather than the narrower template
literal proposal in this PEP. This PEP was again updated, both to
incorporate new ideas inspired by the tagged strings proposal, and to
describe the perceived benefits of the narrower template literal syntax
proposal in this PEP over the more general tagged string proposal.

Summary of differences from f-strings

The key differences between f-strings and t-strings are:

-   the t (template literal) prefix indicates delayed rendering, but
    otherwise largely uses the same syntax and semantics as formatted
    strings
-   template literals are available at runtime as a new kind of object
    (types.TemplateLiteral)
-   the default rendering used by formatted strings is invoked on a
    template literal object by calling format(template) rather than
    being done implicitly in the compiled code
-   unlike f-strings (where conversion specifiers are handled directly
    in the compiler), t-string conversion specifiers are handled at
    rendering time by the rendering function
-   the new !() conversion specifier indicates that the field expression
    is a callable that should be called when using the default format
    rendering function. This specifier is specifically not being added
    to f-strings (since it is pointless there).
-   a second ! is allowed in t-string conversion specifiers (with any
    subsequent text being ignored) as a way to allow custom template
    rendering functions to accept custom conversion specifiers without
    breaking the default !TemplateLiteral.render rendering method. This
    feature is specifically not being added to f-strings (since it is
    pointless there).
-   while f-string f"Message {here}" would be semantically equivalent to
    format(t"Message {here}"), f-strings will continue to be supported
    directly in the compiler and hence avoid the runtime overhead of
    actually using the delayed rendering machinery that is needed for
    t-strings

Summary of differences from tagged strings

When tagged strings were first proposed, there were several notable
differences from the proposal in PEP 501 beyond the surface syntax
difference between whether rendering function invocations are written as
render(t"template literal") or as render"template literal".

Over the course of the initial PEP 750 discussion, many of those
differences were eliminated, either by PEP 501 adopting that aspect of
PEP 750's proposal (such as lazily applying conversion specifiers), or
by PEP 750 changing to retain some aspect of PEP 501's proposal (such as
defining a dedicated type to hold template segments rather than
representing them as simple sequences).

The main remaining significant difference is that this PEP argues that
adding only the t-string prefix is a sufficient enhancement to give all
the desired benefits described in PEP 750. The expansion to a
generalised "tagged string" syntax isn't necessary, and causes
additional problems that can be avoided.

The two PEPs also differ in their proposed approaches to handling lazy
evaluation of template fields.

While there are other differences between the two proposals, those
differences are more cosmetic than substantive. In particular:

-   this PEP proposes different names for the structural typing
    protocols
-   this PEP proposes specific names for the concrete implementation
    types
-   this PEP proposes exact details for the proposed APIs of the
    concrete implementation types (including concatenation and
    repetition support, which are not part of the structural typing
    protocols)
-   this PEP proposes changes to the existing format builtin to make it
    usable directly as a template field renderer

The two PEPs also differ in how they make their case for delayed
rendering support. This PEP focuses more on the concrete implementation
concept of using template literals to allow the "interpolation" and
"rendering" steps in f-string processing to be separated in time, and
then taking advantage of that to reduce the potential code injection
risks associated with misuse of f-strings. PEP 750 focuses more on the
way that native templating support allows behaviours that are difficult
or impossible to achieve via existing string based templating methods.
As with the cosmetic differences noted above, this is more a difference
in style than a difference in substance.

Rationale

f-strings (PEP 498) made interpolating values into strings with full
access to Python's lexical namespace semantics simpler, but it does so
at the cost of creating a situation where interpolating values into
sensitive targets like SQL queries, shell commands and HTML templates
will enjoy a much cleaner syntax when handled without regard for code
injection attacks than when they are handled correctly.

This PEP proposes to provide the option of delaying the actual rendering
of a template literal to a formatted string to its __format__ method,
allowing the use of other template renderers by passing the template
around as a first class object.

While very different in the technical details, the types.TemplateLiteral
interface proposed in this PEP is conceptually quite similar to the
FormattableString type underlying the native interpolation support
introduced in C# 6.0, as well as the JavaScript template literals
introduced in ES6.

While not the original motivation for developing the proposal, many of
the benefits for defining domain specific languages described in PEP 750
also apply to this PEP (including the potential for per-DSL semantic
highlighting in code editors based on the type specifications of
declared template variables and rendering function parameters).

Specification

This PEP proposes a new t string prefix that results in the creation of
an instance of a new type, types.TemplateLiteral.

Template literals are Unicode strings (bytes literals are not
permitted), and string literal concatenation operates as normal, with
the entire combined literal forming the template literal.

The template string is parsed into literals, expressions, format
specifiers, and conversion specifiers as described for f-strings in PEP
498 and PEP 701. The syntax for conversion specifiers is relaxed such
that arbitrary strings are accepted (excluding those containing {, } or
:) rather than being restricted to valid Python identifiers.

However, rather than being rendered directly into a formatted string,
these components are instead organised into instances of new types with
the following behaviour:

    class TemplateLiteralText(str):
        # This is a renamed and extended version of the DecodedConcrete type in PEP 750
        # Real type would be implemented in C, this is an API compatible Python equivalent
        _raw: str

        def __new__(cls, raw: str):
            decoded = raw.encode("utf-8").decode("unicode-escape")
            if decoded == raw:
                decoded = raw
            text = super().__new__(cls, decoded)
            text._raw = raw
            return text

        @staticmethod
        def merge(text_segments:Sequence[TemplateLiteralText]) -> TemplateLiteralText:
            if len(text_segments) == 1:
                return text_segments[0]
            return TemplateLiteralText("".join(t._raw for t in text_segments))

        @property
        def raw(self) -> str:
            return self._raw

        def __repr__(self) -> str:
            return f"{type(self).__name__}(r{self._raw!r})"

        def __add__(self, other:Any) -> TemplateLiteralText|NotImplemented:
            if isinstance(other, TemplateLiteralText):
                return TemplateLiteralText(self._raw + other._raw)
            return NotImplemented


        def __mul__(self, other:Any) -> TemplateLiteralText|NotImplemented:
            try:
                factor = operator.index(other)
            except TypeError:
                return NotImplemented
            return TemplateLiteralText(self._raw * factor)
        __rmul__ = __mul__

    class TemplateLiteralField(NamedTuple):
        # This is mostly a renamed version of the InterpolationConcrete type in PEP 750
        # However:
        #    - value is eagerly evaluated (values were all originally lazy in PEP 750)
        #    - conversion specifiers are allowed to be arbitrary strings
        #    - order of fields is adjusted so the text form is the first field and the
        #      remaining parameters match the updated signature of the `*format` builtin
        # Real type would be implemented in C, this is an API compatible Python equivalent

        expr: str
        value: Any
        format_spec: str | None = None
        conversion_spec: str | None = None

        def __repr__(self) -> str:
            return (f"{type(self).__name__}({self.expr}, {self.value!r}, "
                    f"{self.format_spec!r}, {self.conversion_spec!r})")

        def __str__(self) -> str:
            return format(self.value, self.format_spec, self.conversion_spec)

        def __format__(self, format_override) -> str:
            if format_override:
                format_spec = format_override
            else:
                format_spec = self.format_spec
            return format(self.value, format_spec, self.conversion_spec)

    class TemplateLiteral:
        # This type corresponds to the TemplateConcrete type in PEP 750
        # Real type would be implemented in C, this is an API compatible Python equivalent
        _raw_template: str
        _segments = tuple[TemplateLiteralText|TemplateLiteralField]

        def __new__(cls, raw_template:str, *segments:TemplateLiteralText|TemplateLiteralField):
            self = super().__new__(cls)
            self._raw_template = raw_template
            # Check if there are any adjacent text segments that need merging
            # or any empty text segments that need discarding
            type_err = "Template literal segments must be template literal text or field instances"
            text_expected = True
            needs_merge = False
            for segment in segments:
                match segment:
                    case TemplateLiteralText():
                        if not text_expected or not segment:
                            needs_merge = True
                            break
                        text_expected = False
                    case TemplateLiteralField():
                        text_expected = True
                    case _:
                        raise TypeError(type_err)
            if not needs_merge:
                # Match loop above will have checked all segments
                self._segments = segments
                return self
            # Merge consecutive runs of text fields and drop any empty text fields
            merged_segments:list[TemplateLiteralText|TemplateLiteralField] = []
            pending_merge:list[TemplateLiteralText] = []
            for segment in segments:
                match segment:
                    case TemplateLiteralText() as text_segment:
                        if text_segment:
                            pending_merge.append(text_segment)
                    case TemplateLiteralField():
                        if pending_merge:
                            merged_segments.append(TemplateLiteralText.merge(pending_merge))
                            pending_merge.clear()
                        merged_segments.append(segment)
                    case _:
                        # First loop above may not check all segments when a merge is needed
                        raise TypeError(type_err)
            if pending_merge:
                merged_segments.append(TemplateLiteralText.merge(pending_merge))
                pending_merge.clear()
            self._segments = tuple(merged_segments)
            return self

        @property
        def raw_template(self) -> str:
            return self._raw_template

        @property
        def segments(self) -> tuple[TemplateLiteralText|TemplateLiteralField]:
            return self._segments

        def __len__(self) -> int:
            return len(self._segments)

        def __iter__(self) -> Iterable[TemplateLiteralText|TemplateLiteralField]:
            return iter(self._segments)

        # Note: template literals do NOT define any relative ordering
        def __eq__(self, other):
            if not isinstance(other, TemplateLiteral):
                return NotImplemented
            return (
                self._raw_template == other._raw_template
                and self._segments == other._segments
                and self.field_values == other.field_values
                and self.format_specifiers == other.format_specifiers
            )

        def __repr__(self) -> str:
            return (f"{type(self).__name__}(r{self._raw!r}, "
                    f"{', '.join(map(repr, self._segments))})")

        def __format__(self, format_specifier) -> str:
            # When formatted, render to a string, and then use string formatting
            return format(self.render(), format_specifier)

        def render(self, *, render_template=''.join, render_text=str, render_field=format):
            ...  # See definition of the template rendering semantics below

        def __add__(self, other) -> TemplateLiteral|NotImplemented:
            if isinstance(other, TemplateLiteral):
                combined_raw_text = self._raw + other._raw
                combined_segments = self._segments + other._segments
                return TemplateLiteral(combined_raw_text, *combined_segments)
            if isinstance(other, str):
                # Treat the given string as a new raw text segment
                combined_raw_text = self._raw + other
                combined_segments = self._segments + (TemplateLiteralText(other),)
                return TemplateLiteral(combined_raw_text, *combined_segments)
            return NotImplemented

        def __radd__(self, other) -> TemplateLiteral|NotImplemented:
            if isinstance(other, str):
                # Treat the given string as a new raw text segment. This effectively
                # has precedence over string concatenation in CPython due to
                # https://github.com/python/cpython/issues/55686
                combined_raw_text = other + self._raw
                combined_segments = (TemplateLiteralText(other),) + self._segments
                return TemplateLiteral(combined_raw_text, *combined_segments)
            return NotImplemented

        def __mul__(self, other) -> TemplateLiteral|NotImplemented:
            try:
                factor = operator.index(other)
            except TypeError:
                return NotImplemented
            if not self or factor == 1:
                return self
            if factor < 1:
                return TemplateLiteral("")
            repeated_text = self._raw_template * factor
            repeated_segments = self._segments * factor
            return TemplateLiteral(repeated_text, *repeated_segments)
        __rmul__ = __mul__

(Note: this is an illustrative example implementation, the exact compile
time construction method and internal data management details of
types.TemplateLiteral are considered an implementation detail not
specified by the PEP. However, the expected post-construction behaviour
of the public APIs on types.TemplateLiteral instances is specified by
the above code, as is the constructor signature for building template
instances at runtime)

The result of a template literal expression is an instance of this type,
rather than an already rendered string. Rendering only takes place when
the instance's render method is called (either directly, or indirectly
via __format__).

The compiler will pass the following details to the template literal for
later use:

-   a string containing the raw template as written in the source code
-   a sequence of template segments, with each segment being either:
    -   a literal text segment (a regular Python string that also
        provides access to its raw form)
    -   a parsed template interpolation field, specifying the text of
        the interpolated expression (as a regular string), its evaluated
        result, the format specifier text (with any substitution fields
        eagerly evaluated as an f-string), and the conversion specifier
        text (as a regular string)

The raw template is just the template literal as a string. By default,
it is used to provide a human-readable representation for the template
literal, but template renderers may also use it for other purposes (e.g.
as a cache lookup key).

The parsed template structure is taken from PEP 750 and consists of a
sequence of template segments corresponding to the text segments and
interpolation fields in the template string.

This approach is designed to allow compilers to fully process each
segment of the template in order, before finally emitting code to pass
all of the template segments to the template literal constructor.

For example, assuming the following runtime values:

    names = ["Alice", "Bob", "Carol", "Eve"]
    field_width = 10
    def expressions():
        return 42

The template from the proposal section would be represented at runtime
as:

    TemplateLiteral(
        r"Substitute {names:>{field_width}} and {expressions()!r} at runtime",
        TemplateLiteralText(r"Substitute "),
        TemplateLiteralField("names", ["Alice", "Bob", "Carol", "Eve"], ">10", ""),
        TemplateLiteralText(r" and "),
        TemplateLiteralField("expressions()", 42, "", "r"),
    )

Rendering templates

The TemplateLiteral.render implementation defines the rendering process
in terms of the following renderers:

-   an overall render_template operation that defines how the sequence
    of rendered text and field segments are composed into a fully
    rendered result. The default template renderer is string
    concatenation using ''.join.
-   a per text segment render_text operation that receives the
    individual literal text segments within the template. The default
    text renderer is the builtin str constructor.
-   a per field segment render_field operation that receives the field
    value, format specifier, and conversion specifier for substitution
    fields within the template. The default field renderer is the format
    builtin.

Given the parsed template representation above, the semantics of
template rendering would then be equivalent to the following:

    def render(self, *, render_template=''.join, render_text=str, render_field=format):
        rendered_segments = []
        for segment in self._segments:
            match segment:
                case TemplateLiteralText() as text_segment:
                    rendered_segments.append(render_text(text_segment))
                case TemplateLiteralField() as field_segment:
                    rendered_segments.append(render_field(*field_segment[1:]))
        return render_template(rendered_segments)

Format specifiers

The syntax and processing of field specifiers in t-strings is defined to
be the same as it is for f-strings.

This includes allowing field specifiers to themselves contain f-string
substitution fields. The raw text of the field specifiers (without
processing any substitution fields) is retained as part of the full raw
template string.

The parsed field specifiers receive the field specifier string with
those substitutions already resolved. The : prefix is also omitted.

Aside from separating them out from the substitution expression during
parsing, format specifiers are otherwise treated as opaque strings by
the interpolation template parser - assigning semantics to those (or,
alternatively, prohibiting their use) is handled at rendering time by
the field renderer.

Conversion specifiers

In addition to the existing support for a, r, and s conversion
specifiers, str.format and str.format_map will be updated to accept ()
as a conversion specifier that means "call the interpolated value".

Where PEP 701 restricts conversion specifiers to NAME tokens, this PEP
will instead allow FSTRING_MIDDLE tokens (such that only {, } and : are
disallowed). This change is made primarily to support lazy field
rendering with the !() conversion specifier, but also allows custom
rendering functions more flexibility when defining their own conversion
specifiers in preference to those defined for the default format field
renderer.

Conversion specifiers are still handled as plain strings, and do NOT
support the use of substitution fields.

The parsed conversion specifiers receive the conversion specifier string
with the ! prefix omitted.

To allow custom template renderers to define their own custom conversion
specifiers without causing the default renderer to fail, conversion
specifiers will be permitted to contain a custom suffix prefixed with a
second ! character. That is, !!<custom>, !a!<custom>, !r!<custom>,
!s!<custom>, and !()!<custom> would all be valid conversion specifiers
in a template literal.

As described above, the default rendering supports the original !a, !r
and !s conversion specifiers defined in PEP 3101, together with the new
!() lazy field evaluation conversion specifier defined in this PEP. The
default rendering ignores any custom conversion specifier suffixes.

The full mapping between the standard conversion specifiers and the
special methods called on the interpolated value when the field is
rendered:

-   No conversion (empty string): __format__ (with format specifier as
    parameter)
-   a: __repr__ (as per the ascii builtin)
-   r: __repr__ (as per the repr builtin)
-   s: __str__ (as per the str builtin)
-   (): __call__ (with no parameters)

When a conversion occurs, __format__ (with the format specifier) is
called on the result of the conversion rather than being called on the
original object.

The changes to format and the addition of !operator.convert_field make
it straightforward for custom renderers to also support the standard
conversion specifiers.

f-strings themselves will NOT support the new !() conversion specifier
(as it is redundant when value interpolation and value rendering always
occur at the same time). They also will NOT support the use of custom
conversion specifiers (since the rendering function is known at compile
time and doesn't make use of the custom specifiers).

New field conversion API in the operator module

To support application of the standard conversion specifiers in custom
template rendering functions, a new !operator.convert_field function
will be added:

    def convert_field(value, conversion_spec=''):
        """Apply the given string formatting conversion specifier to the given value"""
        std_spec, sep, custom_spec = conversion_spec.partition("!")
        match std_spec:
            case '':
                return value
            case 'a':
                return ascii(value)
            case 'r':
                return repr(value)
            case 's':
                return str(value)
            case '()':
                return value()
        if not sep:
            err = f"Invalid conversion specifier {std_spec!r}"
        else:
            err = f"Invalid conversion specifier {std_spec!r} in {conversion_spec!r}"
        raise ValueError(f"{err}: expected '', 'a', 'r', 's' or '()')

Conversion specifier parameter added to format

The signature and behaviour of the format builtin will be updated:

    def format(value, format_spec='', conversion_spec=''):
        if conversion_spec:
            value_to_format = operator.convert_field(value)
        else:
            value_to_format = value
        return type(value_to_format).__format__(value, format_spec)

If a non-empty conversion specifier is given, the value will be
converted with !operator.convert_field before looking up the __format__
method.

The signature of the __format__ special method does NOT change (only
format specifiers are handled by the object being formatted).

Structural typing and duck typing

To allow custom renderers to accept alternative interpolation template
implementations (rather than being tightly coupled to the native
template literal types), the following structural protocols will be
added to the typing module:

    @runtime_checkable
    class TemplateText(Protocol):
        # Renamed version of PEP 750's Decoded protocol
        def __str__(self) -> str:
            ...

        raw: str

    @runtime_checkable
    class TemplateField(Protocol):
        # Renamed and modified version of PEP 750's Interpolation protocol
        def __len__(self):
            ...

        def __getitem__(self, index: int):
            ...

        def __str__(self) -> str:
            ...

        expr: str
        value: Any
        format_spec: str | None = None
        conversion_spec: str | None = None

    @runtime_checkable
    class InterpolationTemplate(Protocol):
        # Corresponds to PEP 750's Template protocol
        def __iter__(self) -> Iterable[TemplateText|TemplateField]:
            ...

        raw_template: str

Note that the structural protocol APIs are substantially narrower than
the full implementation APIs defined for TemplateLiteralText,
TemplateLiteralField, and TemplateLiteral.

Code that wants to accept interpolation templates and define specific
handling for them without introducing a dependency on the typing module,
or restricting the code to handling the concrete template literal types,
should instead perform an attribute existence check on raw_template.

Writing custom renderers

Writing a custom renderer doesn't require any special syntax. Instead,
custom renderers are ordinary callables that process an interpolation
template directly either by calling the render() method with alternate
render_template, render_text, and/or render_field implementations, or by
accessing the template's data attributes directly.

For example, the following function would render a template using
objects' repr implementations rather than their native formatting
support:

    def repr_format(template):
        def render_field(value, format_spec, conversion_spec):
            converted_value = operator.convert_field(value, conversion_spec)
            return format(repr(converted_value), format_spec)
        return template.render(render_field=render_field)

The customer renderer shown respects the conversion specifiers in the
original template, but it is also possible to ignore them and render the
interpolated values directly:

    def input_repr_format(template):
        def render_field(value, format_spec, __):
            return format(repr(value), format_spec)
        return template.render(render_field=render_field)

When writing custom renderers, note that the return type of the overall
rendering operation is determined by the return type of the passed in
render_template callable. While this will still be a string for
formatting related use cases, producing non-string objects is permitted.
For example, a custom SQL template renderer could involve an
sqlalchemy.sql.text call that produces an SQL Alchemy query object. A
subprocess invocation related template renderer could produce a string
sequence suitable for passing to subprocess.run, or it could even call
subprocess.run directly, and return the result.

Non-strings may also be returned from render_text and render_field, as
long as they are paired with a render_template implementation that
expects that behaviour.

Custom renderers using the pattern matching style described in PEP 750
are also supported:

    # Use the structural typing protocols rather than the concrete implementation types
    from typing import InterpolationTemplate, TemplateText, TemplateField

    def greet(template: InterpolationTemplate) -> str:
        """Render an interpolation template using structural pattern matching."""
        result = []
        for segment in template:
            match segment:
                match segment:
                    case TemplateText() as text_segment:
                        result.append(text_segment)
                    case TemplateField() as field_segment:
                        result.append(str(field_segment).upper())
        return f"{''.join(result)}!"

Expression evaluation

As with f-strings, the subexpressions that are extracted from the
interpolation template are evaluated in the context where the template
literal appears. This means the expression has full access to local,
nonlocal and global variables. Any valid Python expression can be used
inside {}, including function and method calls.

Because the substitution expressions are evaluated where the string
appears in the source code, there are no additional security concerns
related to the contents of the expression itself, as you could have also
just written the same expression and used runtime field parsing:

    >>> bar=10
    >>> def foo(data):
    ...   return data + 20
    ...
    >>> str(t'input={bar}, output={foo(bar)}')
    'input=10, output=30'

Is essentially equivalent to:

    >>> 'input={}, output={}'.format(bar, foo(bar))
    'input=10, output=30'

Handling code injection attacks

The PEP 498 formatted string syntax makes it potentially attractive to
write code like the following:

    runquery(f"SELECT {column} FROM {table};")
    runcommand(f"cat {filename}")
    return_response(f"<html><body>{response.body}</body></html>")

These all represent potential vectors for code injection attacks, if any
of the variables being interpolated happen to come from an untrusted
source. The specific proposal in this PEP is designed to make it
straightforward to write use case specific renderers that take care of
quoting interpolated values appropriately for the relevant security
context:

    runquery(sql(t"SELECT {column} FROM {table} WHERE column={value};"))
    runcommand(sh(t"cat {filename}"))
    return_response(html(t"<html><body>{response.body}</body></html>"))

This PEP does not cover adding all such renderers to the standard
library immediately (though one for shell escaping is proposed), but
rather proposes to ensure that they can be readily provided by third
party libraries, and potentially incorporated into the standard library
at a later date.

Over time, it is expected that APIs processing potentially dangerous
string inputs may be updated to accept interpolation templates natively,
allowing problematic code examples to be fixed simply by replacing the f
string prefix with a t:

    runquery(t"SELECT {column} FROM {table};")
    runcommand(t"cat {filename}")
    return_response(t"<html><body>{response.body}</body></html>")

It is proposed that a renderer is included in the shlex module, aiming
to offer a more POSIX shell style experience for accessing external
programs, without the significant risks posed by running os.system or
enabling the system shell when using the subprocess module APIs. This
renderer will provide an interface for running external programs
inspired by that offered by the Julia programming language, only with
the backtick based \`cat $filename\ syntax replaced by t"cat {filename}"
style template literals. See more in the :ref:`pep-501-shlex-module
section.

Error handling

Either compile time or run time errors can occur when processing
interpolation expressions. Compile time errors are limited to those
errors that can be detected when parsing a template string into its
component tuples. These errors all raise SyntaxError.

Unmatched braces:

    >>> t'x={x'
      File "<stdin>", line 1
          t'x={x'
             ^
    SyntaxError: missing '}' in template literal expression

Invalid expressions:

    >>> t'x={!x}'
      File "<fstring>", line 1
        !x
        ^
    SyntaxError: invalid syntax

Run time errors occur when evaluating the expressions inside a template
string before creating the template literal object. See PEP 498 for some
examples.

Different renderers may also impose additional runtime constraints on
acceptable interpolated expressions and other formatting details, which
will be reported as runtime exceptions.

Renderer for shell escaping added to shlex

As a reference implementation, a renderer for safe POSIX shell escaping
can be added to the shlex module. This renderer would be called sh and
would be equivalent to calling shlex.quote on each field value in the
template literal.

Thus:

    os.system(shlex.sh(t'cat {myfile}'))

would have the same behavior as:

    os.system('cat ' + shlex.quote(myfile)))

The implementation would be:

    def sh(template: TemplateLiteral):
        def render_field(value, format_spec, conversion_spec)
            field_text = format(value, format_spec, conversion_spec)
            return quote(field_text)
        return template.render(render_field=render_field)

The addition of shlex.sh will NOT change the existing admonishments in
the subprocess documentation that passing shell=True is best avoided,
nor the reference from the os.system documentation the higher level
subprocess APIs.

Changes to subprocess module

With the additional renderer in the shlex module, and the addition of
template literals, the subprocess module can be changed to handle
accepting template literals as an additional input type to Popen, as it
already accepts a sequence, or a string, with different behavior for
each.

With the addition of template literals, subprocess.Popen (and in return,
all its higher level functions such as subprocess.run) could accept
strings in a safe way (at least on
POSIX systems <pep-501-defer-non-posix-shells>).

For example:

    subprocess.run(t'cat {myfile}', shell=True)

would automatically use the shlex.sh renderer provided in this PEP.
Therefore, using shlex inside a subprocess.run call like so:

    subprocess.run(shlex.sh(t'cat {myfile}'), shell=True)

would be redundant, as run would automatically render any template
literals through shlex.sh

Alternatively, when subprocess.Popen is run without shell=True, it could
still provide subprocess with a more ergonomic syntax. For example:

    subprocess.run(t'cat {myfile} --flag {value}')

would be equivalent to:

    subprocess.run(['cat', myfile, '--flag', value])

or, more accurately:

    subprocess.run(shlex.split(f'cat {shlex.quote(myfile)} --flag {shlex.quote(value)}'))

It would do this by first using the shlex.sh renderer, as above, then
using shlex.split on the result.

The implementation inside subprocess.Popen._execute_child would look
like:

    if hasattr(args, "raw_template"):
        import shlex
        if shell:
            args = [shlex.sh(args)]
        else:
            args = shlex.split(shlex.sh(args))

How to Teach This

This PEP intentionally includes two standard renderers that will always
be available in teaching environments: the format builtin and the new
shlex.sh POSIX shell renderer.

Together, these two renderers can be used to build an initial
understanding of delayed rendering on top of a student's initial
introduction to string formatting with f-strings. This initial
understanding would have the goal of allowing students to use template
literals effectively, in combination with pre-existing template
rendering functions.

For example, f"{'some text'}", f"{value}", f"{value!r}", ,
f"{callable()}" could all be introduced.

Those same operations could then be rewritten as
format(t"{'some text'}"), format(t"{value}"), format(t"{value!r}"), ,
format(t"{callable()}") to illustrate the relationship between the eager
rendering form and the delayed rendering form.

The difference between "template definition time" (or "interpolation
time" ) and "template rendering time" can then be investigated further
by storing the template literals as local variables and looking at their
representations separately from the results of the format calls. At this
point, the t"{callable!()}" syntax can be introduced to distinguish
between field expressions that are called at template definition time
and those that are called at template rendering time.

Finally, the differences between the results of f"{'some text'}",
format(t"{'some text'}"), and shlex.sh(t"{'some text'}") could be
explored to illustrate the potential for differences between the default
rendering function and custom rendering functions.

Actually defining your own custom template rendering functions would
then be a separate more advanced topic (similar to the way students are
routinely taught to use decorators and context managers well before they
learn how to write their own custom ones).

PEP 750 includes further ideas for teaching aspects of the delayed
rendering topic.

Discussion

Refer to PEP 498 for previous discussion, as several of the points there
also apply to this PEP. PEP 750's design discussions are also highly
relevant, as that PEP inspired several aspects of the current design.

Support for binary interpolation

As f-strings don't handle byte strings, neither will t-strings.

Interoperability with str-only interfaces

For interoperability with interfaces that only accept strings,
interpolation templates can still be prerendered with format, rather
than delegating the rendering to the called function.

This reflects the key difference from PEP 498, which always eagerly
applies the default rendering, without any way to delegate the choice of
renderer to another section of the code.

Preserving the raw template string

Earlier versions of this PEP failed to make the raw template string
available on the template literal. Retaining it makes it possible to
provide a more attractive template representation, as well as providing
the ability to precisely reconstruct the original string, including both
the expression text and the details of any eagerly rendered substitution
fields in format specifiers.

Creating a rich object rather than a global name lookup

Earlier versions of this PEP used an __interpolate__ builtin, rather
than creating a new kind of object for later consumption by
interpolation functions. Creating a rich descriptive object with a
useful default renderer made it much easier to support customisation of
the semantics of interpolation.

Building atop f-strings rather than replacing them

Earlier versions of this PEP attempted to serve as a complete substitute
for PEP 498 (f-strings) . With the acceptance of that PEP and the more
recent PEP 701, this PEP can instead build a more flexible delayed
rendering capability on top of the existing f-string eager rendering.

Assuming the presence of f-strings as a supporting capability simplified
a number of aspects of the proposal in this PEP (such as how to handle
substitution fields in format specifiers).

Defining repetition and concatenation semantics

This PEP explicitly defines repetition and concatenation semantics for
TemplateLiteral and TemplateLiteralText. While not strictly necessary,
defining these is expected to make the types easier to work with in code
that historically only supported regular strings.

New conversion specifier for lazy field evaluation

The initially published version of PEP 750 defaulted to lazy evaluation
for all interpolation fields. While it was subsequently updated to
default to eager evaluation (as happens for f-strings and this PEP), the
discussions around the topic prompted the idea of providing a way to
indicate to rendering functions that the interpolated field value should
be called at rendering time rather than being used without modification.

Since PEP 750 also deferred the processing of conversion specifiers
until evaluation time, the suggestion was put forward that invoking
__call__ without arguments could be seen as similar to the existing
conversion specifiers that invoke __repr__ (!a, !r) or __str__ (!s).

Accordingly, this PEP was updated to also make conversion specifier
processing the responsibility of rendering functions, and to introduce
!() as a new conversion specifier for lazy evaluation.

Adding !operator.convert_field and updating the format builtin was than
a matter of providing appropriate support to rendering function
implementations that wanted to accept the default conversion specifiers.

Allowing arbitrary conversion specifiers in custom renderers

Accepting !() as a new conversion specifier necessarily requires
updating the syntax that the parser accepts for conversion specifiers
(they are currently restricted to identifiers). This then raised the
question of whether t-string compilation should enforce the additional
restriction that f-string compilation imposes: that the conversion
specifier be exactly one of !a, !r, or !s.

With t-strings already being updated to allow !() when compiled, it made
sense to treat conversion specifiers as relating to rendering function
similar to the way that format specifiers related to the formatting of
individual objects: aside from some characters that are excluded for
parsing reasons, they are otherwise free text fields with the meaning
decided by the consuming function or object. This reduces the temptation
to introduce renderer specific metaformatting into the template's format
specifiers (since any renderer specific information can be placed in the
conversion specifier instead).

Only reserving a single new string prefix

The primary difference between this PEP and PEP 750 is that the latter
aims to enable the use of arbitrary string prefixes, rather than
requiring the creation of template literal instances that are then
passed to other APIs. For example, PEP 750 would allow the sh render
described in this PEP to be used as sh"cat {somefile}" rather than
requiring the template literal to be created explicitly and then passed
to a regular function call (as in sh(t"cat {somefile}")).

The main reason the PEP authors prefer the second spelling is because it
makes it clearer to a reader what is going on: a template literal
instance is being created, and then passed to a callable that knows how
to do something useful with interpolation template instances.

A draft proposal from one of the PEP 750 authors also suggests that
static typecheckers will be able to infer the use of particular domain
specific languages just as readily from the form that uses an explicit
function call as they would be able to infer it from a directly tagged
string.

With the tagged string syntax at least arguably reducing clarity for
human readers without increasing the overall expressiveness of the
construct, it seems reasonable to start with the smallest viable
proposal (a single new string prefix), and then revisit the potential
value of generalising to arbitrary prefixes in the future.

As a lesser, but still genuine, consideration, only using a single new
string prefix for this use case leaves open the possibility of defining
alternate prefixes in the future that still produce TemplateLiteral
objects, but use a different syntax within the string to define the
interpolation fields (see the i18n discussion <pep-501-defer-i18n>
below).

Deferring consideration of more concise delayed evaluation syntax

During the discussions of delayed evaluation, {-> expr} was suggested as
potential syntactic sugar for the already supported lambda based syntax:
{(lambda: expr)} (the parentheses are required in the existing syntax to
avoid misinterpretation of the : character as indicating the start of
the format specifier).

While adding such a spelling would complement the rendering time
function call syntax proposed in this PEP (that is, writing {-> expr!()}
to evaluate arbitrary expressions at rendering time), it is a topic that
the PEP authors consider to be better left to a future PEP if this PEP
or PEP 750 is accepted.

Deferring consideration of possible logging integration

One of the challenges with the logging module has been that we have
previously been unable to devise a reasonable migration strategy away
from the use of printf-style formatting. While the logging module does
allow formatters to specify the use of str.format or string.Template
style substitution, it can be awkward to ensure that messages written
that way are only ever processed by log record formatters that are
expecting that syntax.

The runtime parsing and interpolation overhead for logging messages also
poses a problem for extensive logging of runtime events for monitoring
purposes.

While beyond the scope of this initial PEP, template literal support
could potentially be added to the logging module's event reporting APIs,
permitting relevant details to be captured using forms like:

    logging.debug(t"Event: {event}; Details: {data}")
    logging.critical(t"Error: {error}; Details: {data}")

Rather than the historical mod-formatting style:

    logging.debug("Event: %s; Details: %s", event, data)
    logging.critical("Error: %s; Details: %s", event, data)

As the template literal is passed in as an ordinary argument, other
keyword arguments would also remain available:

    logging.critical(t"Error: {error}; Details: {data}", exc_info=True)

The approach to standardising lazy field evaluation described in this
PEP is primarily based on the anticipated needs of this hypothetical
integration into the logging module:

    logging.debug(t"Eager evaluation of {expensive_call()}")
    logging.debug(t"Lazy evaluation of {expensive_call!()}")

    logging.debug(t"Eager evaluation of {expensive_call_with_args(x, y, z)}")
    logging.debug(t"Lazy evaluation of {(lambda: expensive_call_with_args(x, y, z))!()}")

It's an open question whether the definition of logging formatters would
be updated to support template strings, but if they were, the most
likely way of defining fields which should be
looked up on the log record <logrecord-attributes> instead of being
interpreted eagerly is simply to escape them so they're available as
part of the literal text:

    proc_id = get_process_id()
    formatter = logging.Formatter(t"{{asctime}}:{proc_id}:{{name}}:{{levelname}}{{message}}")

Deferring consideration of possible use in i18n use cases

The initial motivating use case for this PEP was providing a cleaner
syntax for i18n (internationalization) translation, as that requires
access to the original unmodified template. As such, it focused on
compatibility with the substitution syntax used in Python's
string.Template formatting and Mozilla's l20n project.

However, subsequent discussion revealed there are significant additional
considerations to be taken into account in the i18n use case, which
don't impact the simpler cases of handling interpolation into security
sensitive contexts (like HTML, system shells, and database queries), or
producing application debugging messages in the preferred language of
the development team (rather than the native language of end users).

Due to that realisation, the PEP was switched to use the str.format
substitution syntax originally defined in PEP 3101 and subsequently used
as the basis for PEP 498.

While it would theoretically be possible to update string.Template to
support the creation of instances from native template literals, and to
implement the structural typing.Template protocol, the PEP authors have
not identified any practical benefit in doing so.

However, one significant benefit of the "only one string prefix"
approach used in this PEP is that while it generalises the existing
f-string interpolation syntax to support delayed rendering through
t-strings, it doesn't imply that that should be the only compiler
supported interpolation syntax that Python should ever offer.

Most notably, it leaves the door open to an alternate "t$-string" syntax
that would allow TemplateLiteral instances to be created using a PEP 292
based interpolation syntax rather than a PEP 3101 based syntax:

  template = t$"Substitute $words and ${other_values} at runtime"

The only runtime distinction between templates created that way and
templates created from regular t-strings would be in the contents of
their raw_template attributes.

Deferring escaped rendering support for non-POSIX shells

shlex.quote works by classifying the regex character set [\w@%+=:,./-]
to be safe, deeming all other characters to be unsafe, and hence
requiring quoting of the string containing them. The quoting mechanism
used is then specific to the way that string quoting works in POSIX
shells, so it cannot be trusted when running a shell that doesn't follow
POSIX shell string quoting rules.

For example, running
subprocess.run(f'echo {shlex.quote(sys.argv[1])}', shell=True) is safe
when using a shell that follows POSIX quoting rules:

    $ cat > run_quoted.py
    import sys, shlex, subprocess
    subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True)
    $ python3 run_quoted.py pwd
    pwd
    $ python3 run_quoted.py '; pwd'
    ; pwd
    $ python3 run_quoted.py "'pwd'"
    'pwd'

but remains unsafe when running a shell from Python invokes cmd.exe (or
Powershell):

    S:\> echo import sys, shlex, subprocess > run_quoted.py
    S:\> echo subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True) >> run_quoted.py
    S:\> type run_quoted.py
    import sys, shlex, subprocess
    subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True)
    S:\> python3 run_quoted.py "echo OK"
    'echo OK'
    S:\> python3 run_quoted.py "'& echo Oh no!"
    ''"'"'
    Oh no!'

Resolving this standard library limitation is beyond the scope of this
PEP.

Acknowledgements

-   Eric V. Smith for creating PEP 498 and demonstrating the feasibility
    of arbitrary expression substitution in string interpolation
-   The authors of PEP 750 for the substantial design improvements that
    tagged strings inspired for this PEP, their general advocacy for the
    value of language level delayed template rendering support, and
    their efforts to ensure that any native interpolation template
    support lays a strong foundation for future efforts in providing
    robust syntax highlighting and static type checking support for
    domain specific languages
-   Barry Warsaw, Armin Ronacher, and Mike Miller for their
    contributions to exploring the feasibility of using this model of
    delayed rendering in i18n use cases (even though the ultimate
    conclusion was that it was a poor fit, at least for current
    approaches to i18n in Python)

References

-   %-formatting
-   str.format
-   string.Template documentation
-   PEP 215: String Interpolation
-   PEP 292: Simpler String Substitutions
-   PEP 3101: Advanced String Formatting
-   PEP 498: Literal string formatting
-   PEP 675: Arbitrary Literal String Type
-   PEP 701: Syntactic formalization of f-strings
-   FormattableString and C# native string interpolation
-   IFormattable interface in C# (see remarks for globalization notes)
-   TemplateLiterals in Javascript
-   Running external commands in Julia

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.