Following system colour scheme Selected dark colour scheme Selected light colour scheme

Python Enhancement Proposals

PEP 501 – General purpose template literal strings

Author:
Alyssa Coghlan <ncoghlan at gmail.com>, Nick Humrich <nick at humrich.us>
Discussions-To:
Discourse thread
Status:
Withdrawn
Type:
Standards Track
Requires:
701
Created:
08-Aug-2015
Python-Version:
3.12
Post-History:
08-Aug-2015, 05-Sep-2015, 09-Mar-2023
Superseded-By:
750

Table of Contents

Important

This PEP has been superseded by PEP 750.

×

Abstract

Though easy and elegant to use, Python f-strings can be vulnerable to injection attacks when used to construct shell commands, SQL queries, HTML snippets and similar (for example, os.system(f"echo {message_from_user}")). This PEP introduces template literal strings (or “t-strings”), which have syntax and semantics that are similar to f-strings, but with rendering deferred until format() or another template rendering function is called on them. This will allow standard library calls, helper functions and third party tools to safety and intelligently perform appropriate escaping and other string processing on inputs while retaining the usability and convenience of f-strings.

PEP Withdrawal

When PEP 750 was first published as a “tagged strings” proposal (allowing for arbitrary string prefixes), this PEP was kept open to continue championing the simpler “template literal” approach that used a single dedicated string prefix to produce instances of a new “interpolation template” type.

The October 2024 updates to PEP 750 agreed that template strings were a better fit for Python than the broader tagged strings concept.

All of the other concerns the authors of this PEP had with PEP 750 were also either addressed in those updates, or else left in a state where they could reasonably be addressed in a future change proposal.

Due to the clear improvements in the updated PEP 750 proposal, this PEP has been withdrawn in favour of PEP 750.

Important

The remainder of this PEP still reflects the state of the tagged strings proposal in August 2024. It has not been updated to reflect the October 2024 changes to PEP 750, since the PEP withdrawal makes doing so redundant.

Relationship with other PEPs

This PEP is inpired by and builds on top of the f-string syntax first implemented in PEP 498 and formalised in PEP 701.

This PEP complements the literal string typing support added to Python’s formal type system in PEP 675 by introducing a safe way to do dynamic interpolation of runtime values into security sensitive strings.

This PEP competes with some aspects of the tagged string proposal in PEP 750 (most notably in whether template rendering is expressed as render(t"template literal") or as render"template literal"), but also shares many common features (after PEP 750 was published, this PEP was updated with several new changes inspired by the tagged strings proposal).

This PEP does NOT propose an alternative to PEP 292 for user interface internationalization use cases (but does note the potential for future syntactic enhancements aimed at that use case that would benefit from the compiler-supported value interpolation machinery that this PEP and PEP 750 introduce).

Motivation

PEP 498 added new syntactic support for string interpolation that is transparent to the compiler, allowing name references from the interpolation operation full access to containing namespaces (as with any other expression), rather than being limited to explicit name references. These are referred to in the PEP (and elsewhere) as “f-strings” (a mnemonic for “formatted strings”).

Since acceptance of PEP 498, f-strings have become well-established and very popular. f-strings became even more useful and flexible with the formalised grammar in PEP 701. While f-strings are great, eager rendering has its limitations. For example, the eagerness of f-strings has made code like the following unfortunately plausible:

os.system(f"echo {message_from_user}")

This kind of code is superficially elegant, but poses a significant problem if the interpolated value message_from_user is in fact provided by an untrusted user: it’s an opening for a form of code injection attack, where the supplied user data has not been properly escaped before being passed to the os.system call.

While the LiteralString type annotation introduced in PEP 675 means that typecheckers are able to report a type error for this kind of unsafe function usage, those errors don’t help make it easier to write code that uses safer alternatives (such as subprocess.run()).

To address that problem (and a number of other concerns), this PEP proposes the complementary introduction of “t-strings” (a mnemonic for “template literal strings”), where format(t"Message with {data}") would produce the same result as f"Message with {data}", but the template literal instance can instead be passed to other template rendering functions which process the contents of the template differently.

Proposal

Dedicated template literal syntax

This PEP proposes a new string prefix that declares the string to be a template literal rather than an ordinary string:

template = t"Substitute {names:>{field_width}} and {expressions()!r} at runtime"

This would be effectively interpreted as:

template = TemplateLiteral(
    r"Substitute {names:>{field_width}} and {expressions()} at runtime",
    TemplateLiteralText(r"Substitute "),
    TemplateLiteralField("names", names, f">{field_width}", ""),
    TemplateLiteralText(r" and "),
    TemplateLiteralField("expressions()", expressions(), f"", "r"),
)

(Note: this is an illustrative example implementation. The exact compile time construction syntax of types.TemplateLiteral is considered an implementation detail not specified by the PEP. In particular, the compiler may bypass the default constructor’s runtime logic that detects consecutive text segments and merges them into a single text segment, as well as checking the runtime types of all supplied arguments).

The __format__ method on types.TemplateLiteral would then implement the following str.format() inspired semantics:

>>> import datetime
>>> name = 'Jane'
>>> age = 50
>>> anniversary = datetime.date(1991, 10, 12)
>>> format(t'My name is {name}, my age next year is {age+1}, my anniversary is {anniversary:%A, %B %d, %Y}.')
'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
>>> format(t'She said her name is {name!r}.')
"She said her name is 'Jane'."

The syntax of template literals would be based on PEP 701, and largely use the same syntax for the string portion of the template. Aside from using a different prefix, the one other syntactic change is in the definition and handling of conversion specifiers, both to allow !() as a standard conversion specifier to request evaluation of a field at rendering time, and to allow custom renderers to also define custom conversion specifiers.

This PEP does not propose to remove or deprecate any of the existing string formatting mechanisms, as those will remain valuable when formatting strings that are not present directly in the source code of the application.

Lazy field evaluation conversion specifier

In addition to the existing support for the a, r, and s conversion specifiers, str.format(), str.format_map(), and string.Formatter will be updated to accept () as a conversion specifier that means “call the interpolated value”.

To support application of the standard conversion specifiers in custom template rendering functions, a new operator.convert_field() function will be added.

The signature and behaviour of the format() builtin will also be updated to accept a conversion specifier as a third optional parameter. If a non-empty conversion specifier is given, the value will be converted with operator.convert_field() before looking up the __format__ method.

Custom conversion specifiers

To allow additional field-specific directives to be passed to custom rendering functions in a way that still allows formatting of the template with the default renderer, the conversion specifier field will be allowed to contain a second ! character.

operator.convert_field() and format() (and hence the default TemplateLiteral.render template rendering method), will ignore that character and any subsequent text in the conversion specifier field.

str.format(), str.format_map(), and string.Formatter will also be updated to accept (and ignore) custom conversion specifiers.

Template renderer for POSIX shell commands

As both a practical demonstration of the benefits of delayed rendering support, and as a valuable feature in its own right, a new sh template renderer will be added to the shlex module. This renderer will produce strings where all interpolated fields are escaped with shlex.quote().

The subprocess.Popen API (and higher level APIs that depend on it, such as subprocess.run()) will be updated to accept interpolation templates and handle them in accordance with the new shlex.sh renderer.

Background

This PEP was initially proposed as a competitor to PEP 498. After it became clear that the eager rendering proposal had sustantially more immediate support, it then spent several years in a deferred state, pending further experience with PEP 498’s simpler approach of only supporting eager rendering without the additional complexity of also supporting deferred rendering.

Since then, f-strings have become very popular and PEP 701 was introduced to tidy up some rough edges and limitations in their syntax and semantics. The template literal proposal was updated in 2023 to reflect current knowledge of f-strings, and improvements from PEP 701.

In 2024, PEP 750 was published, proposing a general purpose mechanism for custom tagged string prefixes, rather than the narrower template literal proposal in this PEP. This PEP was again updated, both to incorporate new ideas inspired by the tagged strings proposal, and to describe the perceived benefits of the narrower template literal syntax proposal in this PEP over the more general tagged string proposal.

Summary of differences from f-strings

The key differences between f-strings and t-strings are:

  • the t (template literal) prefix indicates delayed rendering, but otherwise largely uses the same syntax and semantics as formatted strings
  • template literals are available at runtime as a new kind of object (types.TemplateLiteral)
  • the default rendering used by formatted strings is invoked on a template literal object by calling format(template) rather than being done implicitly in the compiled code
  • unlike f-strings (where conversion specifiers are handled directly in the compiler), t-string conversion specifiers are handled at rendering time by the rendering function
  • the new !() conversion specifier indicates that the field expression is a callable that should be called when using the default format() rendering function. This specifier is specifically not being added to f-strings (since it is pointless there).
  • a second ! is allowed in t-string conversion specifiers (with any subsequent text being ignored) as a way to allow custom template rendering functions to accept custom conversion specifiers without breaking the default TemplateLiteral.render() rendering method. This feature is specifically not being added to f-strings (since it is pointless there).
  • while f-string f"Message {here}" would be semantically equivalent to format(t"Message {here}"), f-strings will continue to be supported directly in the compiler and hence avoid the runtime overhead of actually using the delayed rendering machinery that is needed for t-strings

Summary of differences from tagged strings

When tagged strings were first proposed, there were several notable differences from the proposal in PEP 501 beyond the surface syntax difference between whether rendering function invocations are written as render(t"template literal") or as render"template literal".

Over the course of the initial PEP 750 discussion, many of those differences were eliminated, either by PEP 501 adopting that aspect of PEP 750’s proposal (such as lazily applying conversion specifiers), or by PEP 750 changing to retain some aspect of PEP 501’s proposal (such as defining a dedicated type to hold template segments rather than representing them as simple sequences).

The main remaining significant difference is that this PEP argues that adding only the t-string prefix is a sufficient enhancement to give all the desired benefits described in PEP 750. The expansion to a generalised “tagged string” syntax isn’t necessary, and causes additional problems that can be avoided.

The two PEPs also differ in their proposed approaches to handling lazy evaluation of template fields.

While there are other differences between the two proposals, those differences are more cosmetic than substantive. In particular:

  • this PEP proposes different names for the structural typing protocols
  • this PEP proposes specific names for the concrete implementation types
  • this PEP proposes exact details for the proposed APIs of the concrete implementation types (including concatenation and repetition support, which are not part of the structural typing protocols)
  • this PEP proposes changes to the existing format() builtin to make it usable directly as a template field renderer

The two PEPs also differ in how they make their case for delayed rendering support. This PEP focuses more on the concrete implementation concept of using template literals to allow the “interpolation” and “rendering” steps in f-string processing to be separated in time, and then taking advantage of that to reduce the potential code injection risks associated with misuse of f-strings. PEP 750 focuses more on the way that native templating support allows behaviours that are difficult or impossible to achieve via existing string based templating methods. As with the cosmetic differences noted above, this is more a difference in style than a difference in substance.

Rationale

f-strings (PEP 498) made interpolating values into strings with full access to Python’s lexical namespace semantics simpler, but it does so at the cost of creating a situation where interpolating values into sensitive targets like SQL queries, shell commands and HTML templates will enjoy a much cleaner syntax when handled without regard for code injection attacks than when they are handled correctly.

This PEP proposes to provide the option of delaying the actual rendering of a template literal to a formatted string to its __format__ method, allowing the use of other template renderers by passing the template around as a first class object.

While very different in the technical details, the types.TemplateLiteral interface proposed in this PEP is conceptually quite similar to the FormattableString type underlying the native interpolation support introduced in C# 6.0, as well as the JavaScript template literals introduced in ES6.

While not the original motivation for developing the proposal, many of the benefits for defining domain specific languages described in PEP 750 also apply to this PEP (including the potential for per-DSL semantic highlighting in code editors based on the type specifications of declared template variables and rendering function parameters).

Specification

This PEP proposes a new t string prefix that results in the creation of an instance of a new type, types.TemplateLiteral.

Template literals are Unicode strings (bytes literals are not permitted), and string literal concatenation operates as normal, with the entire combined literal forming the template literal.

The template string is parsed into literals, expressions, format specifiers, and conversion specifiers as described for f-strings in PEP 498 and PEP 701. The syntax for conversion specifiers is relaxed such that arbitrary strings are accepted (excluding those containing {, } or :) rather than being restricted to valid Python identifiers.

However, rather than being rendered directly into a formatted string, these components are instead organised into instances of new types with the following behaviour:

class TemplateLiteralText(str):
    # This is a renamed and extended version of the DecodedConcrete type in PEP 750
    # Real type would be implemented in C, this is an API compatible Python equivalent
    _raw: str

    def __new__(cls, raw: str):
        decoded = raw.encode("utf-8").decode("unicode-escape")
        if decoded == raw:
            decoded = raw
        text = super().__new__(cls, decoded)
        text._raw = raw
        return text

    @staticmethod
    def merge(text_segments:Sequence[TemplateLiteralText]) -> TemplateLiteralText:
        if len(text_segments) == 1:
            return text_segments[0]
        return TemplateLiteralText("".join(t._raw for t in text_segments))

    @property
    def raw(self) -> str:
        return self._raw

    def __repr__(self) -> str:
        return f"{type(self).__name__}(r{self._raw!r})"

    def __add__(self, other:Any) -> TemplateLiteralText|NotImplemented:
        if isinstance(other, TemplateLiteralText):
            return TemplateLiteralText(self._raw + other._raw)
        return NotImplemented


    def __mul__(self, other:Any) -> TemplateLiteralText|NotImplemented:
        try:
            factor = operator.index(other)
        except TypeError:
            return NotImplemented
        return TemplateLiteralText(self._raw * factor)
    __rmul__ = __mul__

class TemplateLiteralField(NamedTuple):
    # This is mostly a renamed version of the InterpolationConcrete type in PEP 750
    # However:
    #    - value is eagerly evaluated (values were all originally lazy in PEP 750)
    #    - conversion specifiers are allowed to be arbitrary strings
    #    - order of fields is adjusted so the text form is the first field and the
    #      remaining parameters match the updated signature of the `*format` builtin
    # Real type would be implemented in C, this is an API compatible Python equivalent

    expr: str
    value: Any
    format_spec: str | None = None
    conversion_spec: str | None = None

    def __repr__(self) -> str:
        return (f"{type(self).__name__}({self.expr}, {self.value!r}, "
                f"{self.format_spec!r}, {self.conversion_spec!r})")

    def __str__(self) -> str:
        return format(self.value, self.format_spec, self.conversion_spec)

    def __format__(self, format_override) -> str:
        if format_override:
            format_spec = format_override
        else:
            format_spec = self.format_spec
        return format(self.value, format_spec, self.conversion_spec)

class TemplateLiteral:
    # This type corresponds to the TemplateConcrete type in PEP 750
    # Real type would be implemented in C, this is an API compatible Python equivalent
    _raw_template: str
    _segments = tuple[TemplateLiteralText|TemplateLiteralField]

    def __new__(cls, raw_template:str, *segments:TemplateLiteralText|TemplateLiteralField):
        self = super().__new__(cls)
        self._raw_template = raw_template
        # Check if there are any adjacent text segments that need merging
        # or any empty text segments that need discarding
        type_err = "Template literal segments must be template literal text or field instances"
        text_expected = True
        needs_merge = False
        for segment in segments:
            match segment:
                case TemplateLiteralText():
                    if not text_expected or not segment:
                        needs_merge = True
                        break
                    text_expected = False
                case TemplateLiteralField():
                    text_expected = True
                case _:
                    raise TypeError(type_err)
        if not needs_merge:
            # Match loop above will have checked all segments
            self._segments = segments
            return self
        # Merge consecutive runs of text fields and drop any empty text fields
        merged_segments:list[TemplateLiteralText|TemplateLiteralField] = []
        pending_merge:list[TemplateLiteralText] = []
        for segment in segments:
            match segment:
                case TemplateLiteralText() as text_segment:
                    if text_segment:
                        pending_merge.append(text_segment)
                case TemplateLiteralField():
                    if pending_merge:
                        merged_segments.append(TemplateLiteralText.merge(pending_merge))
                        pending_merge.clear()
                    merged_segments.append(segment)
                case _:
                    # First loop above may not check all segments when a merge is needed
                    raise TypeError(type_err)
        if pending_merge:
            merged_segments.append(TemplateLiteralText.merge(pending_merge))
            pending_merge.clear()
        self._segments = tuple(merged_segments)
        return self

    @property
    def raw_template(self) -> str:
        return self._raw_template

    @property
    def segments(self) -> tuple[TemplateLiteralText|TemplateLiteralField]:
        return self._segments

    def __len__(self) -> int:
        return len(self._segments)

    def __iter__(self) -> Iterable[TemplateLiteralText|TemplateLiteralField]:
        return iter(self._segments)

    # Note: template literals do NOT define any relative ordering
    def __eq__(self, other):
        if not isinstance(other, TemplateLiteral):
            return NotImplemented
        return (
            self._raw_template == other._raw_template
            and self._segments == other._segments
            and self.field_values == other.field_values
            and self.format_specifiers == other.format_specifiers
        )

    def __repr__(self) -> str:
        return (f"{type(self).__name__}(r{self._raw!r}, "
                f"{', '.join(map(repr, self._segments))})")

    def __format__(self, format_specifier) -> str:
        # When formatted, render to a string, and then use string formatting
        return format(self.render(), format_specifier)

    def render(self, *, render_template=''.join, render_text=str, render_field=format):
        ...  # See definition of the template rendering semantics below

    def __add__(self, other) -> TemplateLiteral|NotImplemented:
        if isinstance(other, TemplateLiteral):
            combined_raw_text = self._raw + other._raw
            combined_segments = self._segments + other._segments
            return TemplateLiteral(combined_raw_text, *combined_segments)
        if isinstance(other, str):
            # Treat the given string as a new raw text segment
            combined_raw_text = self._raw + other
            combined_segments = self._segments + (TemplateLiteralText(other),)
            return TemplateLiteral(combined_raw_text, *combined_segments)
        return NotImplemented

    def __radd__(self, other) -> TemplateLiteral|NotImplemented:
        if isinstance(other, str):
            # Treat the given string as a new raw text segment. This effectively
            # has precedence over string concatenation in CPython due to
            # https://github.com/python/cpython/issues/55686
            combined_raw_text = other + self._raw
            combined_segments = (TemplateLiteralText(other),) + self._segments
            return TemplateLiteral(combined_raw_text, *combined_segments)
        return NotImplemented

    def __mul__(self, other) -> TemplateLiteral|NotImplemented:
        try:
            factor = operator.index(other)
        except TypeError:
            return NotImplemented
        if not self or factor == 1:
            return self
        if factor < 1:
            return TemplateLiteral("")
        repeated_text = self._raw_template * factor
        repeated_segments = self._segments * factor
        return TemplateLiteral(repeated_text, *repeated_segments)
    __rmul__ = __mul__

(Note: this is an illustrative example implementation, the exact compile time construction method and internal data management details of types.TemplateLiteral are considered an implementation detail not specified by the PEP. However, the expected post-construction behaviour of the public APIs on types.TemplateLiteral instances is specified by the above code, as is the constructor signature for building template instances at runtime)

The result of a template literal expression is an instance of this type, rather than an already rendered string. Rendering only takes place when the instance’s render method is called (either directly, or indirectly via __format__).

The compiler will pass the following details to the template literal for later use:

  • a string containing the raw template as written in the source code
  • a sequence of template segments, with each segment being either:
    • a literal text segment (a regular Python string that also provides access to its raw form)
    • a parsed template interpolation field, specifying the text of the interpolated expression (as a regular string), its evaluated result, the format specifier text (with any substitution fields eagerly evaluated as an f-string), and the conversion specifier text (as a regular string)

The raw template is just the template literal as a string. By default, it is used to provide a human-readable representation for the template literal, but template renderers may also use it for other purposes (e.g. as a cache lookup key).

The parsed template structure is taken from PEP 750 and consists of a sequence of template segments corresponding to the text segments and interpolation fields in the template string.

This approach is designed to allow compilers to fully process each segment of the template in order, before finally emitting code to pass all of the template segments to the template literal constructor.

For example, assuming the following runtime values:

names = ["Alice", "Bob", "Carol", "Eve"]
field_width = 10
def expressions():
    return 42

The template from the proposal section would be represented at runtime as:

TemplateLiteral(
    r"Substitute {names:>{field_width}} and {expressions()!r} at runtime",
    TemplateLiteralText(r"Substitute "),
    TemplateLiteralField("names", ["Alice", "Bob", "Carol", "Eve"], ">10", ""),
    TemplateLiteralText(r" and "),
    TemplateLiteralField("expressions()", 42, "", "r"),
)

Rendering templates

The TemplateLiteral.render implementation defines the rendering process in terms of the following renderers:

  • an overall render_template operation that defines how the sequence of rendered text and field segments are composed into a fully rendered result. The default template renderer is string concatenation using ''.join.
  • a per text segment render_text operation that receives the individual literal text segments within the template. The default text renderer is the builtin str constructor.
  • a per field segment render_field operation that receives the field value, format specifier, and conversion specifier for substitution fields within the template. The default field renderer is the format() builtin.

Given the parsed template representation above, the semantics of template rendering would then be equivalent to the following:

def render(self, *, render_template=''.join, render_text=str, render_field=format):
    rendered_segments = []
    for segment in self._segments:
        match segment:
            case TemplateLiteralText() as text_segment:
                rendered_segments.append(render_text(text_segment))
            case TemplateLiteralField() as field_segment:
                rendered_segments.append(render_field(*field_segment[1:]))
    return render_template(rendered_segments)

Format specifiers

The syntax and processing of field specifiers in t-strings is defined to be the same as it is for f-strings.

This includes allowing field specifiers to themselves contain f-string substitution fields. The raw text of the field specifiers (without processing any substitution fields) is retained as part of the full raw template string.

The parsed field specifiers receive the field specifier string with those substitutions already resolved. The : prefix is also omitted.

Aside from separating them out from the substitution expression during parsing, format specifiers are otherwise treated as opaque strings by the interpolation template parser - assigning semantics to those (or, alternatively, prohibiting their use) is handled at rendering time by the field renderer.

Conversion specifiers

In addition to the existing support for a, r, and s conversion specifiers, str.format() and str.format_map() will be updated to accept () as a conversion specifier that means “call the interpolated value”.

Where PEP 701 restricts conversion specifiers to NAME tokens, this PEP will instead allow FSTRING_MIDDLE tokens (such that only {, } and : are disallowed). This change is made primarily to support lazy field rendering with the !() conversion specifier, but also allows custom rendering functions more flexibility when defining their own conversion specifiers in preference to those defined for the default format() field renderer.

Conversion specifiers are still handled as plain strings, and do NOT support the use of substitution fields.

The parsed conversion specifiers receive the conversion specifier string with the ! prefix omitted.

To allow custom template renderers to define their own custom conversion specifiers without causing the default renderer to fail, conversion specifiers will be permitted to contain a custom suffix prefixed with a second ! character. That is, !!<custom>, !a!<custom>, !r!<custom>, !s!<custom>, and !()!<custom> would all be valid conversion specifiers in a template literal.

As described above, the default rendering supports the original !a, !r and !s conversion specifiers defined in PEP 3101, together with the new !() lazy field evaluation conversion specifier defined in this PEP. The default rendering ignores any custom conversion specifier suffixes.

The full mapping between the standard conversion specifiers and the special methods called on the interpolated value when the field is rendered:

  • No conversion (empty string): __format__ (with format specifier as parameter)
  • a: __repr__ (as per the ascii() builtin)
  • r: __repr__ (as per the repr() builtin)
  • s: __str__ (as per the str builtin)
  • (): __call__ (with no parameters)

When a conversion occurs, __format__ (with the format specifier) is called on the result of the conversion rather than being called on the original object.

The changes to format() and the addition of operator.convert_field() make it straightforward for custom renderers to also support the standard conversion specifiers.

f-strings themselves will NOT support the new !() conversion specifier (as it is redundant when value interpolation and value rendering always occur at the same time). They also will NOT support the use of custom conversion specifiers (since the rendering function is known at compile time and doesn’t make use of the custom specifiers).

New field conversion API in the operator module

To support application of the standard conversion specifiers in custom template rendering functions, a new operator.convert_field() function will be added:

def convert_field(value, conversion_spec=''):
    """Apply the given string formatting conversion specifier to the given value"""
    std_spec, sep, custom_spec = conversion_spec.partition("!")
    match std_spec:
        case '':
            return value
        case 'a':
            return ascii(value)
        case 'r':
            return repr(value)
        case 's':
            return str(value)
        case '()':
            return value()
    if not sep:
        err = f"Invalid conversion specifier {std_spec!r}"
    else:
        err = f"Invalid conversion specifier {std_spec!r} in {conversion_spec!r}"
    raise ValueError(f"{err}: expected '', 'a', 'r', 's' or '()')

Conversion specifier parameter added to format()

The signature and behaviour of the format() builtin will be updated:

def format(value, format_spec='', conversion_spec=''):
    if conversion_spec:
        value_to_format = operator.convert_field(value)
    else:
        value_to_format = value
    return type(value_to_format).__format__(value, format_spec)

If a non-empty conversion specifier is given, the value will be converted with operator.convert_field() before looking up the __format__ method.

The signature of the __format__ special method does NOT change (only format specifiers are handled by the object being formatted).

Structural typing and duck typing

To allow custom renderers to accept alternative interpolation template implementations (rather than being tightly coupled to the native template literal types), the following structural protocols will be added to the typing module:

@runtime_checkable
class TemplateText(Protocol):
    # Renamed version of PEP 750's Decoded protocol
    def __str__(self) -> str:
        ...

    raw: str

@runtime_checkable
class TemplateField(Protocol):
    # Renamed and modified version of PEP 750's Interpolation protocol
    def __len__(self):
        ...

    def __getitem__(self, index: int):
        ...

    def __str__(self) -> str:
        ...

    expr: str
    value: Any
    format_spec: str | None = None
    conversion_spec: str | None = None

@runtime_checkable
class InterpolationTemplate(Protocol):
    # Corresponds to PEP 750's Template protocol
    def __iter__(self) -> Iterable[TemplateText|TemplateField]:
        ...

    raw_template: str

Note that the structural protocol APIs are substantially narrower than the full implementation APIs defined for TemplateLiteralText, TemplateLiteralField, and TemplateLiteral.

Code that wants to accept interpolation templates and define specific handling for them without introducing a dependency on the typing module, or restricting the code to handling the concrete template literal types, should instead perform an attribute existence check on raw_template.

Writing custom renderers

Writing a custom renderer doesn’t require any special syntax. Instead, custom renderers are ordinary callables that process an interpolation template directly either by calling the render() method with alternate render_template, render_text, and/or render_field implementations, or by accessing the template’s data attributes directly.

For example, the following function would render a template using objects’ repr implementations rather than their native formatting support:

def repr_format(template):
    def render_field(value, format_spec, conversion_spec):
        converted_value = operator.convert_field(value, conversion_spec)
        return format(repr(converted_value), format_spec)
    return template.render(render_field=render_field)

The customer renderer shown respects the conversion specifiers in the original template, but it is also possible to ignore them and render the interpolated values directly:

def input_repr_format(template):
    def render_field(value, format_spec, __):
        return format(repr(value), format_spec)
    return template.render(render_field=render_field)

When writing custom renderers, note that the return type of the overall rendering operation is determined by the return type of the passed in render_template callable. While this will still be a string for formatting related use cases, producing non-string objects is permitted. For example, a custom SQL template renderer could involve an sqlalchemy.sql.text call that produces an SQL Alchemy query object. A subprocess invocation related template renderer could produce a string sequence suitable for passing to subprocess.run, or it could even call subprocess.run directly, and return the result.

Non-strings may also be returned from render_text and render_field, as long as they are paired with a render_template implementation that expects that behaviour.

Custom renderers using the pattern matching style described in PEP 750 are also supported:

# Use the structural typing protocols rather than the concrete implementation types
from typing import InterpolationTemplate, TemplateText, TemplateField

def greet(template: InterpolationTemplate) -> str:
    """Render an interpolation template using structural pattern matching."""
    result = []
    for segment in template:
        match segment:
            match segment:
                case TemplateText() as text_segment:
                    result.append(text_segment)
                case TemplateField() as field_segment:
                    result.append(str(field_segment).upper())
    return f"{''.join(result)}!"

Expression evaluation

As with f-strings, the subexpressions that are extracted from the interpolation template are evaluated in the context where the template literal appears. This means the expression has full access to local, nonlocal and global variables. Any valid Python expression can be used inside {}, including function and method calls.

Because the substitution expressions are evaluated where the string appears in the source code, there are no additional security concerns related to the contents of the expression itself, as you could have also just written the same expression and used runtime field parsing:

>>> bar=10
>>> def foo(data):
...   return data + 20
...
>>> str(t'input={bar}, output={foo(bar)}')
'input=10, output=30'

Is essentially equivalent to:

>>> 'input={}, output={}'.format(bar, foo(bar))
'input=10, output=30'

Handling code injection attacks

The PEP 498 formatted string syntax makes it potentially attractive to write code like the following:

runquery(f"SELECT {column} FROM {table};")
runcommand(f"cat {filename}")
return_response(f"<html><body>{response.body}</body></html>")

These all represent potential vectors for code injection attacks, if any of the variables being interpolated happen to come from an untrusted source. The specific proposal in this PEP is designed to make it straightforward to write use case specific renderers that take care of quoting interpolated values appropriately for the relevant security context:

runquery(sql(t"SELECT {column} FROM {table} WHERE column={value};"))
runcommand(sh(t"cat {filename}"))
return_response(html(t"<html><body>{response.body}</body></html>"))

This PEP does not cover adding all such renderers to the standard library immediately (though one for shell escaping is proposed), but rather proposes to ensure that they can be readily provided by third party libraries, and potentially incorporated into the standard library at a later date.

Over time, it is expected that APIs processing potentially dangerous string inputs may be updated to accept interpolation templates natively, allowing problematic code examples to be fixed simply by replacing the f string prefix with a t:

runquery(t"SELECT {column} FROM {table};")
runcommand(t"cat {filename}")
return_response(t"<html><body>{response.body}</body></html>")

It is proposed that a renderer is included in the shlex module, aiming to offer a more POSIX shell style experience for accessing external programs, without the significant risks posed by running os.system or enabling the system shell when using the subprocess module APIs. This renderer will provide an interface for running external programs inspired by that offered by the Julia programming language, only with the backtick based \`cat $filename\` syntax replaced by t"cat {filename}" style template literals. See more in the Renderer for shell escaping added to shlex section.

Error handling

Either compile time or run time errors can occur when processing interpolation expressions. Compile time errors are limited to those errors that can be detected when parsing a template string into its component tuples. These errors all raise SyntaxError.

Unmatched braces:

>>> t'x={x'
  File "<stdin>", line 1
      t'x={x'
         ^
SyntaxError: missing '}' in template literal expression

Invalid expressions:

>>> t'x={!x}'
  File "<fstring>", line 1
    !x
    ^
SyntaxError: invalid syntax

Run time errors occur when evaluating the expressions inside a template string before creating the template literal object. See PEP 498 for some examples.

Different renderers may also impose additional runtime constraints on acceptable interpolated expressions and other formatting details, which will be reported as runtime exceptions.

Renderer for shell escaping added to shlex

As a reference implementation, a renderer for safe POSIX shell escaping can be added to the shlex module. This renderer would be called sh and would be equivalent to calling shlex.quote on each field value in the template literal.

Thus:

os.system(shlex.sh(t'cat {myfile}'))

would have the same behavior as:

os.system('cat ' + shlex.quote(myfile)))

The implementation would be:

def sh(template: TemplateLiteral):
    def render_field(value, format_spec, conversion_spec)
        field_text = format(value, format_spec, conversion_spec)
        return quote(field_text)
    return template.render(render_field=render_field)

The addition of shlex.sh will NOT change the existing admonishments in the subprocess documentation that passing shell=True is best avoided, nor the reference from the os.system() documentation the higher level subprocess APIs.

Changes to subprocess module

With the additional renderer in the shlex module, and the addition of template literals, the subprocess module can be changed to handle accepting template literals as an additional input type to Popen, as it already accepts a sequence, or a string, with different behavior for each.

With the addition of template literals, subprocess.Popen (and in return, all its higher level functions such as subprocess.run()) could accept strings in a safe way (at least on POSIX systems).

For example:

subprocess.run(t'cat {myfile}', shell=True)

would automatically use the shlex.sh renderer provided in this PEP. Therefore, using shlex inside a subprocess.run call like so:

subprocess.run(shlex.sh(t'cat {myfile}'), shell=True)

would be redundant, as run would automatically render any template literals through shlex.sh

Alternatively, when subprocess.Popen is run without shell=True, it could still provide subprocess with a more ergonomic syntax. For example:

subprocess.run(t'cat {myfile} --flag {value}')

would be equivalent to:

subprocess.run(['cat', myfile, '--flag', value])

or, more accurately:

subprocess.run(shlex.split(f'cat {shlex.quote(myfile)} --flag {shlex.quote(value)}'))

It would do this by first using the shlex.sh renderer, as above, then using shlex.split on the result.

The implementation inside subprocess.Popen._execute_child would look like:

if hasattr(args, "raw_template"):
    import shlex
    if shell:
        args = [shlex.sh(args)]
    else:
        args = shlex.split(shlex.sh(args))

How to Teach This

This PEP intentionally includes two standard renderers that will always be available in teaching environments: the format() builtin and the new shlex.sh POSIX shell renderer.

Together, these two renderers can be used to build an initial understanding of delayed rendering on top of a student’s initial introduction to string formatting with f-strings. This initial understanding would have the goal of allowing students to use template literals effectively, in combination with pre-existing template rendering functions.

For example, f"{'some text'}", f"{value}", f"{value!r}", , f"{callable()}" could all be introduced.

Those same operations could then be rewritten as format(t"{'some text'}"), format(t"{value}"), format(t"{value!r}"), , format(t"{callable()}") to illustrate the relationship between the eager rendering form and the delayed rendering form.

The difference between “template definition time” (or “interpolation time” ) and “template rendering time” can then be investigated further by storing the template literals as local variables and looking at their representations separately from the results of the format calls. At this point, the t"{callable!()}" syntax can be introduced to distinguish between field expressions that are called at template definition time and those that are called at template rendering time.

Finally, the differences between the results of f"{'some text'}", format(t"{'some text'}"), and shlex.sh(t"{'some text'}") could be explored to illustrate the potential for differences between the default rendering function and custom rendering functions.

Actually defining your own custom template rendering functions would then be a separate more advanced topic (similar to the way students are routinely taught to use decorators and context managers well before they learn how to write their own custom ones).

PEP 750 includes further ideas for teaching aspects of the delayed rendering topic.

Discussion

Refer to PEP 498 for previous discussion, as several of the points there also apply to this PEP. PEP 750’s design discussions are also highly relevant, as that PEP inspired several aspects of the current design.

Support for binary interpolation

As f-strings don’t handle byte strings, neither will t-strings.

Interoperability with str-only interfaces

For interoperability with interfaces that only accept strings, interpolation templates can still be prerendered with format(), rather than delegating the rendering to the called function.

This reflects the key difference from PEP 498, which always eagerly applies the default rendering, without any way to delegate the choice of renderer to another section of the code.

Preserving the raw template string

Earlier versions of this PEP failed to make the raw template string available on the template literal. Retaining it makes it possible to provide a more attractive template representation, as well as providing the ability to precisely reconstruct the original string, including both the expression text and the details of any eagerly rendered substitution fields in format specifiers.

Creating a rich object rather than a global name lookup

Earlier versions of this PEP used an __interpolate__ builtin, rather than creating a new kind of object for later consumption by interpolation functions. Creating a rich descriptive object with a useful default renderer made it much easier to support customisation of the semantics of interpolation.

Building atop f-strings rather than replacing them

Earlier versions of this PEP attempted to serve as a complete substitute for PEP 498 (f-strings) . With the acceptance of that PEP and the more recent PEP 701, this PEP can instead build a more flexible delayed rendering capability on top of the existing f-string eager rendering.

Assuming the presence of f-strings as a supporting capability simplified a number of aspects of the proposal in this PEP (such as how to handle substitution fields in format specifiers).

Defining repetition and concatenation semantics

This PEP explicitly defines repetition and concatenation semantics for TemplateLiteral and TemplateLiteralText. While not strictly necessary, defining these is expected to make the types easier to work with in code that historically only supported regular strings.

New conversion specifier for lazy field evaluation

The initially published version of PEP 750 defaulted to lazy evaluation for all interpolation fields. While it was subsequently updated to default to eager evaluation (as happens for f-strings and this PEP), the discussions around the topic prompted the idea of providing a way to indicate to rendering functions that the interpolated field value should be called at rendering time rather than being used without modification.

Since PEP 750 also deferred the processing of conversion specifiers until evaluation time, the suggestion was put forward that invoking __call__ without arguments could be seen as similar to the existing conversion specifiers that invoke __repr__ (!a, !r) or __str__ (!s).

Accordingly, this PEP was updated to also make conversion specifier processing the responsibility of rendering functions, and to introduce !() as a new conversion specifier for lazy evaluation.

Adding operator.convert_field() and updating the format() builtin was than a matter of providing appropriate support to rendering function implementations that wanted to accept the default conversion specifiers.

Allowing arbitrary conversion specifiers in custom renderers

Accepting !() as a new conversion specifier necessarily requires updating the syntax that the parser accepts for conversion specifiers (they are currently restricted to identifiers). This then raised the question of whether t-string compilation should enforce the additional restriction that f-string compilation imposes: that the conversion specifier be exactly one of !a, !r, or !s.

With t-strings already being updated to allow !() when compiled, it made sense to treat conversion specifiers as relating to rendering function similar to the way that format specifiers related to the formatting of individual objects: aside from some characters that are excluded for parsing reasons, they are otherwise free text fields with the meaning decided by the consuming function or object. This reduces the temptation to introduce renderer specific metaformatting into the template’s format specifiers (since any renderer specific information can be placed in the conversion specifier instead).

Only reserving a single new string prefix

The primary difference between this PEP and PEP 750 is that the latter aims to enable the use of arbitrary string prefixes, rather than requiring the creation of template literal instances that are then passed to other APIs. For example, PEP 750 would allow the sh render described in this PEP to be used as sh"cat {somefile}" rather than requiring the template literal to be created explicitly and then passed to a regular function call (as in sh(t"cat {somefile}")).

The main reason the PEP authors prefer the second spelling is because it makes it clearer to a reader what is going on: a template literal instance is being created, and then passed to a callable that knows how to do something useful with interpolation template instances.

A draft proposal from one of the PEP 750 authors also suggests that static typecheckers will be able to infer the use of particular domain specific languages just as readily from the form that uses an explicit function call as they would be able to infer it from a directly tagged string.

With the tagged string syntax at least arguably reducing clarity for human readers without increasing the overall expressiveness of the construct, it seems reasonable to start with the smallest viable proposal (a single new string prefix), and then revisit the potential value of generalising to arbitrary prefixes in the future.

As a lesser, but still genuine, consideration, only using a single new string prefix for this use case leaves open the possibility of defining alternate prefixes in the future that still produce TemplateLiteral objects, but use a different syntax within the string to define the interpolation fields (see the i18n discussion below).

Deferring consideration of more concise delayed evaluation syntax

During the discussions of delayed evaluation, {-> expr} was suggested as potential syntactic sugar for the already supported lambda based syntax: {(lambda: expr)} (the parentheses are required in the existing syntax to avoid misinterpretation of the : character as indicating the start of the format specifier).

While adding such a spelling would complement the rendering time function call syntax proposed in this PEP (that is, writing {-> expr!()} to evaluate arbitrary expressions at rendering time), it is a topic that the PEP authors consider to be better left to a future PEP if this PEP or PEP 750 is accepted.

Deferring consideration of possible logging integration

One of the challenges with the logging module has been that we have previously been unable to devise a reasonable migration strategy away from the use of printf-style formatting. While the logging module does allow formatters to specify the use of str.format() or string.Template style substitution, it can be awkward to ensure that messages written that way are only ever processed by log record formatters that are expecting that syntax.

The runtime parsing and interpolation overhead for logging messages also poses a problem for extensive logging of runtime events for monitoring purposes.

While beyond the scope of this initial PEP, template literal support could potentially be added to the logging module’s event reporting APIs, permitting relevant details to be captured using forms like:

logging.debug(t"Event: {event}; Details: {data}")
logging.critical(t"Error: {error}; Details: {data}")

Rather than the historical mod-formatting style:

logging.debug("Event: %s; Details: %s", event, data)
logging.critical("Error: %s; Details: %s", event, data)

As the template literal is passed in as an ordinary argument, other keyword arguments would also remain available:

logging.critical(t"Error: {error}; Details: {data}", exc_info=True)

The approach to standardising lazy field evaluation described in this PEP is primarily based on the anticipated needs of this hypothetical integration into the logging module:

logging.debug(t"Eager evaluation of {expensive_call()}")
logging.debug(t"Lazy evaluation of {expensive_call!()}")

logging.debug(t"Eager evaluation of {expensive_call_with_args(x, y, z)}")
logging.debug(t"Lazy evaluation of {(lambda: expensive_call_with_args(x, y, z))!()}")

It’s an open question whether the definition of logging formatters would be updated to support template strings, but if they were, the most likely way of defining fields which should be looked up on the log record instead of being interpreted eagerly is simply to escape them so they’re available as part of the literal text:

proc_id = get_process_id()
formatter = logging.Formatter(t"{{asctime}}:{proc_id}:{{name}}:{{levelname}}{{message}}")

Deferring consideration of possible use in i18n use cases

The initial motivating use case for this PEP was providing a cleaner syntax for i18n (internationalization) translation, as that requires access to the original unmodified template. As such, it focused on compatibility with the substitution syntax used in Python’s string.Template formatting and Mozilla’s l20n project.

However, subsequent discussion revealed there are significant additional considerations to be taken into account in the i18n use case, which don’t impact the simpler cases of handling interpolation into security sensitive contexts (like HTML, system shells, and database queries), or producing application debugging messages in the preferred language of the development team (rather than the native language of end users).

Due to that realisation, the PEP was switched to use the str.format() substitution syntax originally defined in PEP 3101 and subsequently used as the basis for PEP 498.

While it would theoretically be possible to update string.Template to support the creation of instances from native template literals, and to implement the structural typing.Template protocol, the PEP authors have not identified any practical benefit in doing so.

However, one significant benefit of the “only one string prefix” approach used in this PEP is that while it generalises the existing f-string interpolation syntax to support delayed rendering through t-strings, it doesn’t imply that that should be the only compiler supported interpolation syntax that Python should ever offer.

Most notably, it leaves the door open to an alternate “t$-string” syntax that would allow TemplateLiteral instances to be created using a PEP 292 based interpolation syntax rather than a PEP 3101 based syntax:

template = t$”Substitute $words and ${other_values} at runtime”

The only runtime distinction between templates created that way and templates created from regular t-strings would be in the contents of their raw_template attributes.

Deferring escaped rendering support for non-POSIX shells

shlex.quote() works by classifying the regex character set [\w@%+=:,./-] to be safe, deeming all other characters to be unsafe, and hence requiring quoting of the string containing them. The quoting mechanism used is then specific to the way that string quoting works in POSIX shells, so it cannot be trusted when running a shell that doesn’t follow POSIX shell string quoting rules.

For example, running subprocess.run(f'echo {shlex.quote(sys.argv[1])}', shell=True) is safe when using a shell that follows POSIX quoting rules:

$ cat > run_quoted.py
import sys, shlex, subprocess
subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True)
$ python3 run_quoted.py pwd
pwd
$ python3 run_quoted.py '; pwd'
; pwd
$ python3 run_quoted.py "'pwd'"
'pwd'

but remains unsafe when running a shell from Python invokes cmd.exe (or Powershell):

S:\> echo import sys, shlex, subprocess > run_quoted.py
S:\> echo subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True) >> run_quoted.py
S:\> type run_quoted.py
import sys, shlex, subprocess
subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True)
S:\> python3 run_quoted.py "echo OK"
'echo OK'
S:\> python3 run_quoted.py "'& echo Oh no!"
''"'"'
Oh no!'

Resolving this standard library limitation is beyond the scope of this PEP.

Acknowledgements

  • Eric V. Smith for creating PEP 498 and demonstrating the feasibility of arbitrary expression substitution in string interpolation
  • The authors of PEP 750 for the substantial design improvements that tagged strings inspired for this PEP, their general advocacy for the value of language level delayed template rendering support, and their efforts to ensure that any native interpolation template support lays a strong foundation for future efforts in providing robust syntax highlighting and static type checking support for domain specific languages
  • Barry Warsaw, Armin Ronacher, and Mike Miller for their contributions to exploring the feasibility of using this model of delayed rendering in i18n use cases (even though the ultimate conclusion was that it was a poor fit, at least for current approaches to i18n in Python)

References


Source: https://github.com/python/peps/blob/main/peps/pep-0501.rst

Last modified: 2024-10-19 14:00:43 GMT