PEP 501 – General purpose template literal strings
- Author:
- Alyssa Coghlan <ncoghlan at gmail.com>, Nick Humrich <nick at humrich.us>
- Discussions-To:
- Discourse thread
- Status:
- Withdrawn
- Type:
- Standards Track
- Requires:
- 701
- Created:
- 08-Aug-2015
- Python-Version:
- 3.12
- Post-History:
- 08-Aug-2015, 05-Sep-2015, 09-Mar-2023
- Superseded-By:
- 750
Table of Contents
- Abstract
- PEP Withdrawal
- Relationship with other PEPs
- Motivation
- Proposal
- Background
- Rationale
- Specification
- Rendering templates
- Format specifiers
- Conversion specifiers
- New field conversion API in the
operator
module - Conversion specifier parameter added to
format()
- Structural typing and duck typing
- Writing custom renderers
- Expression evaluation
- Handling code injection attacks
- Error handling
- Renderer for shell escaping added to
shlex
- Changes to subprocess module
- How to Teach This
- Discussion
- Support for binary interpolation
- Interoperability with str-only interfaces
- Preserving the raw template string
- Creating a rich object rather than a global name lookup
- Building atop f-strings rather than replacing them
- Defining repetition and concatenation semantics
- New conversion specifier for lazy field evaluation
- Allowing arbitrary conversion specifiers in custom renderers
- Only reserving a single new string prefix
- Deferring consideration of more concise delayed evaluation syntax
- Deferring consideration of possible logging integration
- Deferring consideration of possible use in i18n use cases
- Deferring escaped rendering support for non-POSIX shells
- Acknowledgements
- References
- Copyright
Abstract
Though easy and elegant to use, Python f-strings
can be vulnerable to injection attacks when used to construct
shell commands, SQL queries, HTML snippets and similar
(for example, os.system(f"echo {message_from_user}")
).
This PEP introduces template literal strings (or “t-strings”),
which have syntax and semantics that are similar to f-strings,
but with rendering deferred until format()
or another
template rendering function is called on them.
This will allow standard library calls, helper functions
and third party tools to safety and intelligently perform
appropriate escaping and other string processing on inputs
while retaining the usability and convenience of f-strings.
PEP Withdrawal
When PEP 750 was first published as a “tagged strings” proposal (allowing for arbitrary string prefixes), this PEP was kept open to continue championing the simpler “template literal” approach that used a single dedicated string prefix to produce instances of a new “interpolation template” type.
The October 2024 updates to PEP 750 agreed that template strings were a better fit for Python than the broader tagged strings concept.
All of the other concerns the authors of this PEP had with PEP 750 were also either addressed in those updates, or else left in a state where they could reasonably be addressed in a future change proposal.
Due to the clear improvements in the updated PEP 750 proposal, this PEP has been withdrawn in favour of PEP 750.
Important
The remainder of this PEP still reflects the state of the tagged strings proposal in August 2024. It has not been updated to reflect the October 2024 changes to PEP 750, since the PEP withdrawal makes doing so redundant.
Relationship with other PEPs
This PEP is inpired by and builds on top of the f-string syntax first implemented in PEP 498 and formalised in PEP 701.
This PEP complements the literal string typing support added to Python’s formal type system in PEP 675 by introducing a safe way to do dynamic interpolation of runtime values into security sensitive strings.
This PEP competes with some aspects of the tagged string proposal in PEP 750
(most notably in whether template rendering is expressed as render(t"template literal")
or as render"template literal"
), but also shares many common features (after
PEP 750 was published, this PEP was updated with
several new changes
inspired by the tagged strings proposal).
This PEP does NOT propose an alternative to PEP 292 for user interface internationalization use cases (but does note the potential for future syntactic enhancements aimed at that use case that would benefit from the compiler-supported value interpolation machinery that this PEP and PEP 750 introduce).
Motivation
PEP 498 added new syntactic support for string interpolation that is transparent to the compiler, allowing name references from the interpolation operation full access to containing namespaces (as with any other expression), rather than being limited to explicit name references. These are referred to in the PEP (and elsewhere) as “f-strings” (a mnemonic for “formatted strings”).
Since acceptance of PEP 498, f-strings have become well-established and very popular. f-strings became even more useful and flexible with the formalised grammar in PEP 701. While f-strings are great, eager rendering has its limitations. For example, the eagerness of f-strings has made code like the following unfortunately plausible:
os.system(f"echo {message_from_user}")
This kind of code is superficially elegant, but poses a significant problem
if the interpolated value message_from_user
is in fact provided by an
untrusted user: it’s an opening for a form of code injection attack, where
the supplied user data has not been properly escaped before being passed to
the os.system
call.
While the LiteralString
type annotation introduced in PEP 675 means that typecheckers
are able to report a type error for this kind of unsafe function usage, those errors don’t
help make it easier to write code that uses safer alternatives (such as
subprocess.run()
).
To address that problem (and a number of other concerns), this PEP proposes
the complementary introduction of “t-strings” (a mnemonic for “template literal strings”),
where format(t"Message with {data}")
would produce the same result as
f"Message with {data}"
, but the template literal instance can instead be passed
to other template rendering functions which process the contents of the template
differently.
Proposal
Dedicated template literal syntax
This PEP proposes a new string prefix that declares the string to be a template literal rather than an ordinary string:
template = t"Substitute {names:>{field_width}} and {expressions()!r} at runtime"
This would be effectively interpreted as:
template = TemplateLiteral(
r"Substitute {names:>{field_width}} and {expressions()} at runtime",
TemplateLiteralText(r"Substitute "),
TemplateLiteralField("names", names, f">{field_width}", ""),
TemplateLiteralText(r" and "),
TemplateLiteralField("expressions()", expressions(), f"", "r"),
)
(Note: this is an illustrative example implementation. The exact compile time construction
syntax of types.TemplateLiteral
is considered an implementation detail not specified by
the PEP. In particular, the compiler may bypass the default constructor’s runtime logic that
detects consecutive text segments and merges them into a single text segment, as well as
checking the runtime types of all supplied arguments).
The __format__
method on types.TemplateLiteral
would then
implement the following str.format()
inspired semantics:
>>> import datetime
>>> name = 'Jane'
>>> age = 50
>>> anniversary = datetime.date(1991, 10, 12)
>>> format(t'My name is {name}, my age next year is {age+1}, my anniversary is {anniversary:%A, %B %d, %Y}.')
'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
>>> format(t'She said her name is {name!r}.')
"She said her name is 'Jane'."
The syntax of template literals would be based on PEP 701, and largely use the same
syntax for the string portion of the template. Aside from using a different prefix, the one
other syntactic change is in the definition and handling of conversion specifiers, both to
allow !()
as a standard conversion specifier to request evaluation of a field at
rendering time, and to allow custom renderers to also define custom conversion specifiers.
This PEP does not propose to remove or deprecate any of the existing string formatting mechanisms, as those will remain valuable when formatting strings that are not present directly in the source code of the application.
Lazy field evaluation conversion specifier
In addition to the existing support for the a
, r
, and s
conversion specifiers,
str.format()
, str.format_map()
, and string.Formatter
will be updated
to accept ()
as a conversion specifier that means “call the interpolated value”.
To support application of the standard conversion specifiers in custom template rendering
functions, a new operator.convert_field()
function will be added.
The signature and behaviour of the format()
builtin will also be updated to accept a
conversion specifier as a third optional parameter. If a non-empty conversion specifier
is given, the value will be converted with operator.convert_field()
before looking up
the __format__
method.
Custom conversion specifiers
To allow additional field-specific directives to be passed to custom rendering functions in
a way that still allows formatting of the template with the default renderer, the conversion
specifier field will be allowed to contain a second !
character.
operator.convert_field()
and format()
(and hence the default
TemplateLiteral.render
template rendering method), will ignore that character and any
subsequent text in the conversion specifier field.
str.format()
, str.format_map()
, and string.Formatter
will also be
updated to accept (and ignore) custom conversion specifiers.
Template renderer for POSIX shell commands
As both a practical demonstration of the benefits of delayed rendering support, and as
a valuable feature in its own right, a new sh
template renderer will be added to
the shlex
module. This renderer will produce strings where all interpolated fields
are escaped with shlex.quote()
.
The subprocess.Popen
API (and higher level APIs that depend on it, such as
subprocess.run()
) will be updated to accept interpolation templates and handle
them in accordance with the new shlex.sh
renderer.
Background
This PEP was initially proposed as a competitor to PEP 498. After it became clear that the eager rendering proposal had sustantially more immediate support, it then spent several years in a deferred state, pending further experience with PEP 498’s simpler approach of only supporting eager rendering without the additional complexity of also supporting deferred rendering.
Since then, f-strings have become very popular and PEP 701 was introduced to tidy up some rough edges and limitations in their syntax and semantics. The template literal proposal was updated in 2023 to reflect current knowledge of f-strings, and improvements from PEP 701.
In 2024, PEP 750 was published, proposing a general purpose mechanism for custom tagged string prefixes, rather than the narrower template literal proposal in this PEP. This PEP was again updated, both to incorporate new ideas inspired by the tagged strings proposal, and to describe the perceived benefits of the narrower template literal syntax proposal in this PEP over the more general tagged string proposal.
Summary of differences from f-strings
The key differences between f-strings and t-strings are:
- the
t
(template literal) prefix indicates delayed rendering, but otherwise largely uses the same syntax and semantics as formatted strings - template literals are available at runtime as a new kind of object
(
types.TemplateLiteral
) - the default rendering used by formatted strings is invoked on a
template literal object by calling
format(template)
rather than being done implicitly in the compiled code - unlike f-strings (where conversion specifiers are handled directly in the compiler), t-string conversion specifiers are handled at rendering time by the rendering function
- the new
!()
conversion specifier indicates that the field expression is a callable that should be called when using the defaultformat()
rendering function. This specifier is specifically not being added to f-strings (since it is pointless there). - a second
!
is allowed in t-string conversion specifiers (with any subsequent text being ignored) as a way to allow custom template rendering functions to accept custom conversion specifiers without breaking the defaultTemplateLiteral.render()
rendering method. This feature is specifically not being added to f-strings (since it is pointless there). - while f-string
f"Message {here}"
would be semantically equivalent toformat(t"Message {here}")
, f-strings will continue to be supported directly in the compiler and hence avoid the runtime overhead of actually using the delayed rendering machinery that is needed for t-strings
Summary of differences from tagged strings
When tagged strings were
first proposed,
there were several notable differences from the proposal in PEP 501 beyond the surface
syntax difference between whether rendering function invocations are written as
render(t"template literal")
or as render"template literal"
.
Over the course of the initial PEP 750 discussion, many of those differences were eliminated, either by PEP 501 adopting that aspect of PEP 750’s proposal (such as lazily applying conversion specifiers), or by PEP 750 changing to retain some aspect of PEP 501’s proposal (such as defining a dedicated type to hold template segments rather than representing them as simple sequences).
The main remaining significant difference is that this PEP argues that adding only the t-string prefix is a sufficient enhancement to give all the desired benefits described in PEP 750. The expansion to a generalised “tagged string” syntax isn’t necessary, and causes additional problems that can be avoided.
The two PEPs also differ in their proposed approaches to handling lazy evaluation of template fields.
While there are other differences between the two proposals, those differences are more cosmetic than substantive. In particular:
- this PEP proposes different names for the structural typing protocols
- this PEP proposes specific names for the concrete implementation types
- this PEP proposes exact details for the proposed APIs of the concrete implementation types (including concatenation and repetition support, which are not part of the structural typing protocols)
- this PEP proposes changes to the existing
format()
builtin to make it usable directly as a template field renderer
The two PEPs also differ in how they make their case for delayed rendering support. This PEP focuses more on the concrete implementation concept of using template literals to allow the “interpolation” and “rendering” steps in f-string processing to be separated in time, and then taking advantage of that to reduce the potential code injection risks associated with misuse of f-strings. PEP 750 focuses more on the way that native templating support allows behaviours that are difficult or impossible to achieve via existing string based templating methods. As with the cosmetic differences noted above, this is more a difference in style than a difference in substance.
Rationale
f-strings (PEP 498) made interpolating values into strings with full access to Python’s lexical namespace semantics simpler, but it does so at the cost of creating a situation where interpolating values into sensitive targets like SQL queries, shell commands and HTML templates will enjoy a much cleaner syntax when handled without regard for code injection attacks than when they are handled correctly.
This PEP proposes to provide the option of delaying the actual rendering
of a template literal to a formatted string to its __format__
method, allowing the use
of other template renderers by passing the template around as a first class object.
While very different in the technical details, the
types.TemplateLiteral
interface proposed in this PEP is
conceptually quite similar to the FormattableString
type underlying the
native interpolation
support introduced in C# 6.0, as well as the
JavaScript template literals
introduced in ES6.
While not the original motivation for developing the proposal, many of the benefits for defining domain specific languages described in PEP 750 also apply to this PEP (including the potential for per-DSL semantic highlighting in code editors based on the type specifications of declared template variables and rendering function parameters).
Specification
This PEP proposes a new t
string prefix that
results in the creation of an instance of a new type,
types.TemplateLiteral
.
Template literals are Unicode strings (bytes literals are not permitted), and string literal concatenation operates as normal, with the entire combined literal forming the template literal.
The template string is parsed into literals, expressions, format specifiers, and conversion
specifiers as described for f-strings in PEP 498 and PEP 701. The syntax for conversion
specifiers is relaxed such that arbitrary strings are accepted (excluding those containing
{
, }
or :
) rather than being restricted to valid Python identifiers.
However, rather than being rendered directly into a formatted string, these components are instead organised into instances of new types with the following behaviour:
class TemplateLiteralText(str):
# This is a renamed and extended version of the DecodedConcrete type in PEP 750
# Real type would be implemented in C, this is an API compatible Python equivalent
_raw: str
def __new__(cls, raw: str):
decoded = raw.encode("utf-8").decode("unicode-escape")
if decoded == raw:
decoded = raw
text = super().__new__(cls, decoded)
text._raw = raw
return text
@staticmethod
def merge(text_segments:Sequence[TemplateLiteralText]) -> TemplateLiteralText:
if len(text_segments) == 1:
return text_segments[0]
return TemplateLiteralText("".join(t._raw for t in text_segments))
@property
def raw(self) -> str:
return self._raw
def __repr__(self) -> str:
return f"{type(self).__name__}(r{self._raw!r})"
def __add__(self, other:Any) -> TemplateLiteralText|NotImplemented:
if isinstance(other, TemplateLiteralText):
return TemplateLiteralText(self._raw + other._raw)
return NotImplemented
def __mul__(self, other:Any) -> TemplateLiteralText|NotImplemented:
try:
factor = operator.index(other)
except TypeError:
return NotImplemented
return TemplateLiteralText(self._raw * factor)
__rmul__ = __mul__
class TemplateLiteralField(NamedTuple):
# This is mostly a renamed version of the InterpolationConcrete type in PEP 750
# However:
# - value is eagerly evaluated (values were all originally lazy in PEP 750)
# - conversion specifiers are allowed to be arbitrary strings
# - order of fields is adjusted so the text form is the first field and the
# remaining parameters match the updated signature of the `*format` builtin
# Real type would be implemented in C, this is an API compatible Python equivalent
expr: str
value: Any
format_spec: str | None = None
conversion_spec: str | None = None
def __repr__(self) -> str:
return (f"{type(self).__name__}({self.expr}, {self.value!r}, "
f"{self.format_spec!r}, {self.conversion_spec!r})")
def __str__(self) -> str:
return format(self.value, self.format_spec, self.conversion_spec)
def __format__(self, format_override) -> str:
if format_override:
format_spec = format_override
else:
format_spec = self.format_spec
return format(self.value, format_spec, self.conversion_spec)
class TemplateLiteral:
# This type corresponds to the TemplateConcrete type in PEP 750
# Real type would be implemented in C, this is an API compatible Python equivalent
_raw_template: str
_segments = tuple[TemplateLiteralText|TemplateLiteralField]
def __new__(cls, raw_template:str, *segments:TemplateLiteralText|TemplateLiteralField):
self = super().__new__(cls)
self._raw_template = raw_template
# Check if there are any adjacent text segments that need merging
# or any empty text segments that need discarding
type_err = "Template literal segments must be template literal text or field instances"
text_expected = True
needs_merge = False
for segment in segments:
match segment:
case TemplateLiteralText():
if not text_expected or not segment:
needs_merge = True
break
text_expected = False
case TemplateLiteralField():
text_expected = True
case _:
raise TypeError(type_err)
if not needs_merge:
# Match loop above will have checked all segments
self._segments = segments
return self
# Merge consecutive runs of text fields and drop any empty text fields
merged_segments:list[TemplateLiteralText|TemplateLiteralField] = []
pending_merge:list[TemplateLiteralText] = []
for segment in segments:
match segment:
case TemplateLiteralText() as text_segment:
if text_segment:
pending_merge.append(text_segment)
case TemplateLiteralField():
if pending_merge:
merged_segments.append(TemplateLiteralText.merge(pending_merge))
pending_merge.clear()
merged_segments.append(segment)
case _:
# First loop above may not check all segments when a merge is needed
raise TypeError(type_err)
if pending_merge:
merged_segments.append(TemplateLiteralText.merge(pending_merge))
pending_merge.clear()
self._segments = tuple(merged_segments)
return self
@property
def raw_template(self) -> str:
return self._raw_template
@property
def segments(self) -> tuple[TemplateLiteralText|TemplateLiteralField]:
return self._segments
def __len__(self) -> int:
return len(self._segments)
def __iter__(self) -> Iterable[TemplateLiteralText|TemplateLiteralField]:
return iter(self._segments)
# Note: template literals do NOT define any relative ordering
def __eq__(self, other):
if not isinstance(other, TemplateLiteral):
return NotImplemented
return (
self._raw_template == other._raw_template
and self._segments == other._segments
and self.field_values == other.field_values
and self.format_specifiers == other.format_specifiers
)
def __repr__(self) -> str:
return (f"{type(self).__name__}(r{self._raw!r}, "
f"{', '.join(map(repr, self._segments))})")
def __format__(self, format_specifier) -> str:
# When formatted, render to a string, and then use string formatting
return format(self.render(), format_specifier)
def render(self, *, render_template=''.join, render_text=str, render_field=format):
... # See definition of the template rendering semantics below
def __add__(self, other) -> TemplateLiteral|NotImplemented:
if isinstance(other, TemplateLiteral):
combined_raw_text = self._raw + other._raw
combined_segments = self._segments + other._segments
return TemplateLiteral(combined_raw_text, *combined_segments)
if isinstance(other, str):
# Treat the given string as a new raw text segment
combined_raw_text = self._raw + other
combined_segments = self._segments + (TemplateLiteralText(other),)
return TemplateLiteral(combined_raw_text, *combined_segments)
return NotImplemented
def __radd__(self, other) -> TemplateLiteral|NotImplemented:
if isinstance(other, str):
# Treat the given string as a new raw text segment. This effectively
# has precedence over string concatenation in CPython due to
# https://github.com/python/cpython/issues/55686
combined_raw_text = other + self._raw
combined_segments = (TemplateLiteralText(other),) + self._segments
return TemplateLiteral(combined_raw_text, *combined_segments)
return NotImplemented
def __mul__(self, other) -> TemplateLiteral|NotImplemented:
try:
factor = operator.index(other)
except TypeError:
return NotImplemented
if not self or factor == 1:
return self
if factor < 1:
return TemplateLiteral("")
repeated_text = self._raw_template * factor
repeated_segments = self._segments * factor
return TemplateLiteral(repeated_text, *repeated_segments)
__rmul__ = __mul__
(Note: this is an illustrative example implementation, the exact compile time construction
method and internal data management details of types.TemplateLiteral
are considered an
implementation detail not specified by the PEP. However, the expected post-construction
behaviour of the public APIs on types.TemplateLiteral
instances is specified by the
above code, as is the constructor signature for building template instances at runtime)
The result of a template literal expression is an instance of this
type, rather than an already rendered string. Rendering only takes
place when the instance’s render
method is called (either directly, or
indirectly via __format__
).
The compiler will pass the following details to the template literal for later use:
- a string containing the raw template as written in the source code
- a sequence of template segments, with each segment being either:
- a literal text segment (a regular Python string that also provides access to its raw form)
- a parsed template interpolation field, specifying the text of the interpolated expression (as a regular string), its evaluated result, the format specifier text (with any substitution fields eagerly evaluated as an f-string), and the conversion specifier text (as a regular string)
The raw template is just the template literal as a string. By default, it is used to provide a human-readable representation for the template literal, but template renderers may also use it for other purposes (e.g. as a cache lookup key).
The parsed template structure is taken from PEP 750 and consists of a sequence of template segments corresponding to the text segments and interpolation fields in the template string.
This approach is designed to allow compilers to fully process each segment of the template in order, before finally emitting code to pass all of the template segments to the template literal constructor.
For example, assuming the following runtime values:
names = ["Alice", "Bob", "Carol", "Eve"]
field_width = 10
def expressions():
return 42
The template from the proposal section would be represented at runtime as:
TemplateLiteral(
r"Substitute {names:>{field_width}} and {expressions()!r} at runtime",
TemplateLiteralText(r"Substitute "),
TemplateLiteralField("names", ["Alice", "Bob", "Carol", "Eve"], ">10", ""),
TemplateLiteralText(r" and "),
TemplateLiteralField("expressions()", 42, "", "r"),
)
Rendering templates
The TemplateLiteral.render
implementation defines the rendering
process in terms of the following renderers:
- an overall
render_template
operation that defines how the sequence of rendered text and field segments are composed into a fully rendered result. The default template renderer is string concatenation using''.join
. - a per text segment
render_text
operation that receives the individual literal text segments within the template. The default text renderer is the builtinstr
constructor. - a per field segment
render_field
operation that receives the field value, format specifier, and conversion specifier for substitution fields within the template. The default field renderer is theformat()
builtin.
Given the parsed template representation above, the semantics of template rendering would then be equivalent to the following:
def render(self, *, render_template=''.join, render_text=str, render_field=format):
rendered_segments = []
for segment in self._segments:
match segment:
case TemplateLiteralText() as text_segment:
rendered_segments.append(render_text(text_segment))
case TemplateLiteralField() as field_segment:
rendered_segments.append(render_field(*field_segment[1:]))
return render_template(rendered_segments)
Format specifiers
The syntax and processing of field specifiers in t-strings is defined to be the same as it is for f-strings.
This includes allowing field specifiers to themselves contain f-string substitution fields. The raw text of the field specifiers (without processing any substitution fields) is retained as part of the full raw template string.
The parsed field specifiers receive the field specifier string with those substitutions
already resolved. The :
prefix is also omitted.
Aside from separating them out from the substitution expression during parsing, format specifiers are otherwise treated as opaque strings by the interpolation template parser - assigning semantics to those (or, alternatively, prohibiting their use) is handled at rendering time by the field renderer.
Conversion specifiers
In addition to the existing support for a
, r
, and s
conversion specifiers,
str.format()
and str.format_map()
will be updated to accept ()
as a
conversion specifier that means “call the interpolated value”.
Where PEP 701 restricts conversion specifiers to NAME
tokens, this PEP will instead
allow FSTRING_MIDDLE
tokens (such that only {
, }
and :
are disallowed). This
change is made primarily to support lazy field rendering with the !()
conversion
specifier, but also allows custom rendering functions more flexibility when defining their
own conversion specifiers in preference to those defined for the default format()
field
renderer.
Conversion specifiers are still handled as plain strings, and do NOT support the use of substitution fields.
The parsed conversion specifiers receive the conversion specifier string with the
!
prefix omitted.
To allow custom template renderers to define their own custom conversion specifiers without
causing the default renderer to fail, conversion specifiers will be permitted to contain a
custom suffix prefixed with a second !
character. That is, !!<custom>
,
!a!<custom>
, !r!<custom>
, !s!<custom>
, and !()!<custom>
would all be
valid conversion specifiers in a template literal.
As described above, the default rendering supports the original !a
, !r
and !s
conversion specifiers defined in PEP 3101, together with the new !()
lazy field
evaluation conversion specifier defined in this PEP. The default rendering ignores any
custom conversion specifier suffixes.
The full mapping between the standard conversion specifiers and the special methods called on the interpolated value when the field is rendered:
- No conversion (empty string):
__format__
(with format specifier as parameter) a
:__repr__
(as per theascii()
builtin)r
:__repr__
(as per therepr()
builtin)s
:__str__
(as per thestr
builtin)()
:__call__
(with no parameters)
When a conversion occurs, __format__
(with the format specifier) is called on the result
of the conversion rather than being called on the original object.
The changes to format()
and the addition of operator.convert_field()
make it
straightforward for custom renderers to also support the standard conversion specifiers.
f-strings themselves will NOT support the new !()
conversion specifier (as it is
redundant when value interpolation and value rendering always occur at the same time). They
also will NOT support the use of custom conversion specifiers (since the rendering function
is known at compile time and doesn’t make use of the custom specifiers).
New field conversion API in the operator
module
To support application of the standard conversion specifiers in custom template rendering
functions, a new operator.convert_field()
function will be added:
def convert_field(value, conversion_spec=''):
"""Apply the given string formatting conversion specifier to the given value"""
std_spec, sep, custom_spec = conversion_spec.partition("!")
match std_spec:
case '':
return value
case 'a':
return ascii(value)
case 'r':
return repr(value)
case 's':
return str(value)
case '()':
return value()
if not sep:
err = f"Invalid conversion specifier {std_spec!r}"
else:
err = f"Invalid conversion specifier {std_spec!r} in {conversion_spec!r}"
raise ValueError(f"{err}: expected '', 'a', 'r', 's' or '()')
Conversion specifier parameter added to format()
The signature and behaviour of the format()
builtin will be updated:
def format(value, format_spec='', conversion_spec=''):
if conversion_spec:
value_to_format = operator.convert_field(value)
else:
value_to_format = value
return type(value_to_format).__format__(value, format_spec)
If a non-empty conversion specifier is given, the value will be converted with
operator.convert_field()
before looking up the __format__
method.
The signature of the __format__
special method does NOT change (only format specifiers
are handled by the object being formatted).
Structural typing and duck typing
To allow custom renderers to accept alternative interpolation template implementations
(rather than being tightly coupled to the native template literal types), the
following structural protocols will be added to the typing
module:
@runtime_checkable
class TemplateText(Protocol):
# Renamed version of PEP 750's Decoded protocol
def __str__(self) -> str:
...
raw: str
@runtime_checkable
class TemplateField(Protocol):
# Renamed and modified version of PEP 750's Interpolation protocol
def __len__(self):
...
def __getitem__(self, index: int):
...
def __str__(self) -> str:
...
expr: str
value: Any
format_spec: str | None = None
conversion_spec: str | None = None
@runtime_checkable
class InterpolationTemplate(Protocol):
# Corresponds to PEP 750's Template protocol
def __iter__(self) -> Iterable[TemplateText|TemplateField]:
...
raw_template: str
Note that the structural protocol APIs are substantially narrower than the full
implementation APIs defined for TemplateLiteralText
, TemplateLiteralField
,
and TemplateLiteral
.
Code that wants to accept interpolation templates and define specific handling for them
without introducing a dependency on the typing
module, or restricting the code to
handling the concrete template literal types, should instead perform an attribute
existence check on raw_template
.
Writing custom renderers
Writing a custom renderer doesn’t require any special syntax. Instead,
custom renderers are ordinary callables that process an interpolation
template directly either by calling the render()
method with alternate
render_template
, render_text
, and/or render_field
implementations, or by
accessing the template’s data attributes directly.
For example, the following function would render a template using objects’
repr
implementations rather than their native formatting support:
def repr_format(template):
def render_field(value, format_spec, conversion_spec):
converted_value = operator.convert_field(value, conversion_spec)
return format(repr(converted_value), format_spec)
return template.render(render_field=render_field)
The customer renderer shown respects the conversion specifiers in the original template, but it is also possible to ignore them and render the interpolated values directly:
def input_repr_format(template):
def render_field(value, format_spec, __):
return format(repr(value), format_spec)
return template.render(render_field=render_field)
When writing custom renderers, note that the return type of the overall
rendering operation is determined by the return type of the passed in render_template
callable. While this will still be a string for formatting related use cases, producing
non-string objects is permitted. For example, a custom SQL
template renderer could involve an sqlalchemy.sql.text
call that produces an
SQL Alchemy query object.
A subprocess invocation related template renderer could produce a string sequence suitable
for passing to subprocess.run
, or it could even call subprocess.run
directly, and
return the result.
Non-strings may also be returned from render_text
and render_field
, as long as
they are paired with a render_template
implementation that expects that behaviour.
Custom renderers using the pattern matching style described in PEP 750 are also supported:
# Use the structural typing protocols rather than the concrete implementation types
from typing import InterpolationTemplate, TemplateText, TemplateField
def greet(template: InterpolationTemplate) -> str:
"""Render an interpolation template using structural pattern matching."""
result = []
for segment in template:
match segment:
match segment:
case TemplateText() as text_segment:
result.append(text_segment)
case TemplateField() as field_segment:
result.append(str(field_segment).upper())
return f"{''.join(result)}!"
Expression evaluation
As with f-strings, the subexpressions that are extracted from the interpolation
template are evaluated in the context where the template literal
appears. This means the expression has full access to local, nonlocal and global variables.
Any valid Python expression can be used inside {}
, including
function and method calls.
Because the substitution expressions are evaluated where the string appears in the source code, there are no additional security concerns related to the contents of the expression itself, as you could have also just written the same expression and used runtime field parsing:
>>> bar=10
>>> def foo(data):
... return data + 20
...
>>> str(t'input={bar}, output={foo(bar)}')
'input=10, output=30'
Is essentially equivalent to:
>>> 'input={}, output={}'.format(bar, foo(bar))
'input=10, output=30'
Handling code injection attacks
The PEP 498 formatted string syntax makes it potentially attractive to write code like the following:
runquery(f"SELECT {column} FROM {table};")
runcommand(f"cat {filename}")
return_response(f"<html><body>{response.body}</body></html>")
These all represent potential vectors for code injection attacks, if any of the variables being interpolated happen to come from an untrusted source. The specific proposal in this PEP is designed to make it straightforward to write use case specific renderers that take care of quoting interpolated values appropriately for the relevant security context:
runquery(sql(t"SELECT {column} FROM {table} WHERE column={value};"))
runcommand(sh(t"cat {filename}"))
return_response(html(t"<html><body>{response.body}</body></html>"))
This PEP does not cover adding all such renderers to the standard library immediately (though one for shell escaping is proposed), but rather proposes to ensure that they can be readily provided by third party libraries, and potentially incorporated into the standard library at a later date.
Over time, it is expected that APIs processing potentially dangerous string inputs may be
updated to accept interpolation templates natively, allowing problematic code examples to
be fixed simply by replacing the f
string prefix with a t
:
runquery(t"SELECT {column} FROM {table};")
runcommand(t"cat {filename}")
return_response(t"<html><body>{response.body}</body></html>")
It is proposed that a renderer is included in the shlex
module, aiming to offer a
more POSIX shell style experience for accessing external programs, without the significant
risks posed by running os.system
or enabling the system shell when using the
subprocess
module APIs. This renderer will provide an interface for running external
programs inspired by that offered by the
Julia programming language,
only with the backtick based \`cat $filename\`
syntax replaced by t"cat {filename}"
style template literals. See more in the Renderer for shell escaping added to shlex section.
Error handling
Either compile time or run time errors can occur when processing interpolation expressions. Compile time errors are limited to those errors that can be detected when parsing a template string into its component tuples. These errors all raise SyntaxError.
Unmatched braces:
>>> t'x={x'
File "<stdin>", line 1
t'x={x'
^
SyntaxError: missing '}' in template literal expression
Invalid expressions:
>>> t'x={!x}'
File "<fstring>", line 1
!x
^
SyntaxError: invalid syntax
Run time errors occur when evaluating the expressions inside a template string before creating the template literal object. See PEP 498 for some examples.
Different renderers may also impose additional runtime constraints on acceptable interpolated expressions and other formatting details, which will be reported as runtime exceptions.
Renderer for shell escaping added to shlex
As a reference implementation, a renderer for safe POSIX shell escaping can be added to
the shlex
module. This renderer would be called sh
and would be equivalent to
calling shlex.quote
on each field value in the template literal.
Thus:
os.system(shlex.sh(t'cat {myfile}'))
would have the same behavior as:
os.system('cat ' + shlex.quote(myfile)))
The implementation would be:
def sh(template: TemplateLiteral):
def render_field(value, format_spec, conversion_spec)
field_text = format(value, format_spec, conversion_spec)
return quote(field_text)
return template.render(render_field=render_field)
The addition of shlex.sh
will NOT change the existing admonishments in the
subprocess
documentation that passing shell=True
is best avoided, nor the
reference from the os.system()
documentation the higher level subprocess
APIs.
Changes to subprocess module
With the additional renderer in the shlex module, and the addition of template literals,
the subprocess
module can be changed to handle accepting template literals
as an additional input type to Popen
, as it already accepts a sequence, or a string,
with different behavior for each.
With the addition of template literals, subprocess.Popen
(and in return, all its
higher level functions such as subprocess.run()
) could accept strings in a safe way
(at least on POSIX systems).
For example:
subprocess.run(t'cat {myfile}', shell=True)
would automatically use the shlex.sh
renderer provided in this PEP. Therefore, using
shlex
inside a subprocess.run
call like so:
subprocess.run(shlex.sh(t'cat {myfile}'), shell=True)
would be redundant, as run
would automatically render any template literals
through shlex.sh
Alternatively, when subprocess.Popen
is run without shell=True
, it could still
provide subprocess with a more ergonomic syntax. For example:
subprocess.run(t'cat {myfile} --flag {value}')
would be equivalent to:
subprocess.run(['cat', myfile, '--flag', value])
or, more accurately:
subprocess.run(shlex.split(f'cat {shlex.quote(myfile)} --flag {shlex.quote(value)}'))
It would do this by first using the shlex.sh
renderer, as above, then using
shlex.split
on the result.
The implementation inside subprocess.Popen._execute_child
would look like:
if hasattr(args, "raw_template"):
import shlex
if shell:
args = [shlex.sh(args)]
else:
args = shlex.split(shlex.sh(args))
How to Teach This
This PEP intentionally includes two standard renderers that will always be available in
teaching environments: the format()
builtin and the new shlex.sh
POSIX shell
renderer.
Together, these two renderers can be used to build an initial understanding of delayed rendering on top of a student’s initial introduction to string formatting with f-strings. This initial understanding would have the goal of allowing students to use template literals effectively, in combination with pre-existing template rendering functions.
For example, f"{'some text'}"
, f"{value}"
, f"{value!r}"
, , f"{callable()}"
could all be introduced.
Those same operations could then be rewritten as format(t"{'some text'}")
,
format(t"{value}")
, format(t"{value!r}")
, , format(t"{callable()}")
to
illustrate the relationship between the eager rendering form and the delayed rendering
form.
The difference between “template definition time” (or “interpolation time” ) and
“template rendering time” can then be investigated further by storing the template literals
as local variables and looking at their representations separately from the results of the
format
calls. At this point, the t"{callable!()}"
syntax can be introduced to
distinguish between field expressions that are called at template definition time and those
that are called at template rendering time.
Finally, the differences between the results of f"{'some text'}"
,
format(t"{'some text'}")
, and shlex.sh(t"{'some text'}")
could be explored to
illustrate the potential for differences between the default rendering function and custom
rendering functions.
Actually defining your own custom template rendering functions would then be a separate more advanced topic (similar to the way students are routinely taught to use decorators and context managers well before they learn how to write their own custom ones).
PEP 750 includes further ideas for teaching aspects of the delayed rendering topic.
Discussion
Refer to PEP 498 for previous discussion, as several of the points there also apply to this PEP. PEP 750’s design discussions are also highly relevant, as that PEP inspired several aspects of the current design.
Support for binary interpolation
As f-strings don’t handle byte strings, neither will t-strings.
Interoperability with str-only interfaces
For interoperability with interfaces that only accept strings, interpolation
templates can still be prerendered with format()
, rather than delegating the
rendering to the called function.
This reflects the key difference from PEP 498, which always eagerly applies the default rendering, without any way to delegate the choice of renderer to another section of the code.
Preserving the raw template string
Earlier versions of this PEP failed to make the raw template string available on the template literal. Retaining it makes it possible to provide a more attractive template representation, as well as providing the ability to precisely reconstruct the original string, including both the expression text and the details of any eagerly rendered substitution fields in format specifiers.
Creating a rich object rather than a global name lookup
Earlier versions of this PEP used an __interpolate__
builtin, rather than
creating a new kind of object for later consumption by interpolation
functions. Creating a rich descriptive object with a useful default renderer
made it much easier to support customisation of the semantics of interpolation.
Building atop f-strings rather than replacing them
Earlier versions of this PEP attempted to serve as a complete substitute for PEP 498 (f-strings) . With the acceptance of that PEP and the more recent PEP 701, this PEP can instead build a more flexible delayed rendering capability on top of the existing f-string eager rendering.
Assuming the presence of f-strings as a supporting capability simplified a number of aspects of the proposal in this PEP (such as how to handle substitution fields in format specifiers).
Defining repetition and concatenation semantics
This PEP explicitly defines repetition and concatenation semantics for TemplateLiteral
and TemplateLiteralText
. While not strictly necessary, defining these is expected
to make the types easier to work with in code that historically only supported regular
strings.
New conversion specifier for lazy field evaluation
The initially published version of PEP 750 defaulted to lazy evaluation for all interpolation fields. While it was subsequently updated to default to eager evaluation (as happens for f-strings and this PEP), the discussions around the topic prompted the idea of providing a way to indicate to rendering functions that the interpolated field value should be called at rendering time rather than being used without modification.
Since PEP 750 also deferred the processing of conversion specifiers until evaluation time,
the suggestion was put forward that invoking __call__
without arguments could be seen
as similar to the existing conversion specifiers that invoke __repr__
(!a
, !r
)
or __str__
(!s
).
Accordingly, this PEP was updated to also make conversion specifier processing the
responsibility of rendering functions, and to introduce !()
as a new conversion
specifier for lazy evaluation.
Adding operator.convert_field()
and updating the format()
builtin was than
a matter of providing appropriate support to rendering function implementations that
wanted to accept the default conversion specifiers.
Allowing arbitrary conversion specifiers in custom renderers
Accepting !()
as a new conversion specifier necessarily requires updating the syntax
that the parser accepts for conversion specifiers (they are currently restricted to
identifiers). This then raised the question of whether t-string compilation should enforce
the additional restriction that f-string compilation imposes: that the conversion specifier
be exactly one of !a
, !r
, or !s
.
With t-strings already being updated to allow !()
when compiled, it made sense to treat
conversion specifiers as relating to rendering function similar to the way that format
specifiers related to the formatting of individual objects: aside from some characters that
are excluded for parsing reasons, they are otherwise free text fields with the meaning
decided by the consuming function or object. This reduces the temptation to introduce
renderer specific metaformatting into the template’s format specifiers (since any
renderer specific information can be placed in the conversion specifier instead).
Only reserving a single new string prefix
The primary difference between this PEP and PEP 750 is that the latter aims to enable
the use of arbitrary string prefixes, rather than requiring the creation of template
literal instances that are then passed to other APIs. For example, PEP 750 would allow
the sh
render described in this PEP to be used as sh"cat {somefile}"
rather than
requiring the template literal to be created explicitly and then passed to a regular
function call (as in sh(t"cat {somefile}")
).
The main reason the PEP authors prefer the second spelling is because it makes it clearer to a reader what is going on: a template literal instance is being created, and then passed to a callable that knows how to do something useful with interpolation template instances.
A draft proposal from one of the PEP 750 authors also suggests that static typecheckers will be able to infer the use of particular domain specific languages just as readily from the form that uses an explicit function call as they would be able to infer it from a directly tagged string.
With the tagged string syntax at least arguably reducing clarity for human readers without increasing the overall expressiveness of the construct, it seems reasonable to start with the smallest viable proposal (a single new string prefix), and then revisit the potential value of generalising to arbitrary prefixes in the future.
As a lesser, but still genuine, consideration, only using a single new string prefix for
this use case leaves open the possibility of defining alternate prefixes in the future that
still produce TemplateLiteral
objects, but use a different syntax within the string to
define the interpolation fields (see the i18n discussion below).
Deferring consideration of more concise delayed evaluation syntax
During the discussions of delayed evaluation, {-> expr}
was
suggested
as potential syntactic sugar for the already supported lambda
based syntax:
{(lambda: expr)}
(the parentheses are required in the existing syntax to avoid
misinterpretation of the :
character as indicating the start of the format specifier).
While adding such a spelling would complement the rendering time function call syntax
proposed in this PEP (that is, writing {-> expr!()}
to evaluate arbitrary expressions
at rendering time), it is a topic that the PEP authors consider to be better left to a
future PEP if this PEP or PEP 750 is accepted.
Deferring consideration of possible logging integration
One of the challenges with the logging module has been that we have previously
been unable to devise a reasonable migration strategy away from the use of
printf-style formatting. While the logging module does allow formatters to specify the
use of str.format()
or string.Template
style substitution, it can be awkward
to ensure that messages written that way are only ever processed by log record formatters
that are expecting that syntax.
The runtime parsing and interpolation overhead for logging messages also poses a problem for extensive logging of runtime events for monitoring purposes.
While beyond the scope of this initial PEP, template literal support could potentially be added to the logging module’s event reporting APIs, permitting relevant details to be captured using forms like:
logging.debug(t"Event: {event}; Details: {data}")
logging.critical(t"Error: {error}; Details: {data}")
Rather than the historical mod-formatting style:
logging.debug("Event: %s; Details: %s", event, data)
logging.critical("Error: %s; Details: %s", event, data)
As the template literal is passed in as an ordinary argument, other keyword arguments would also remain available:
logging.critical(t"Error: {error}; Details: {data}", exc_info=True)
The approach to standardising lazy field evaluation described in this PEP is primarily based on the anticipated needs of this hypothetical integration into the logging module:
logging.debug(t"Eager evaluation of {expensive_call()}")
logging.debug(t"Lazy evaluation of {expensive_call!()}")
logging.debug(t"Eager evaluation of {expensive_call_with_args(x, y, z)}")
logging.debug(t"Lazy evaluation of {(lambda: expensive_call_with_args(x, y, z))!()}")
It’s an open question whether the definition of logging formatters would be updated to support template strings, but if they were, the most likely way of defining fields which should be looked up on the log record instead of being interpreted eagerly is simply to escape them so they’re available as part of the literal text:
proc_id = get_process_id()
formatter = logging.Formatter(t"{{asctime}}:{proc_id}:{{name}}:{{levelname}}{{message}}")
Deferring consideration of possible use in i18n use cases
The initial motivating use case for this PEP was providing a cleaner syntax
for i18n (internationalization) translation, as that requires access to the original
unmodified template. As such, it focused on compatibility with the substitution syntax
used in Python’s string.Template
formatting and Mozilla’s l20n project.
However, subsequent discussion revealed there are significant additional considerations to be taken into account in the i18n use case, which don’t impact the simpler cases of handling interpolation into security sensitive contexts (like HTML, system shells, and database queries), or producing application debugging messages in the preferred language of the development team (rather than the native language of end users).
Due to that realisation, the PEP was switched to use the str.format()
substitution
syntax originally defined in PEP 3101 and subsequently used as the basis for PEP 498.
While it would theoretically be possible to update string.Template
to support
the creation of instances from native template literals, and to implement the structural
typing.Template
protocol, the PEP authors have not identified any practical benefit
in doing so.
However, one significant benefit of the “only one string prefix” approach used in this PEP is that while it generalises the existing f-string interpolation syntax to support delayed rendering through t-strings, it doesn’t imply that that should be the only compiler supported interpolation syntax that Python should ever offer.
Most notably, it leaves the door open to an alternate “t$-string” syntax that would allow
TemplateLiteral
instances to be created using a PEP 292 based interpolation syntax
rather than a PEP 3101 based syntax:
template = t$”Substitute $words and ${other_values} at runtime”
The only runtime distinction between templates created that way and templates created from
regular t-strings would be in the contents of their raw_template
attributes.
Deferring escaped rendering support for non-POSIX shells
shlex.quote()
works by classifying the regex character set [\w@%+=:,./-]
to be
safe, deeming all other characters to be unsafe, and hence requiring quoting of the string
containing them. The quoting mechanism used is then specific to the way that string quoting
works in POSIX shells, so it cannot be trusted when running a shell that doesn’t follow
POSIX shell string quoting rules.
For example, running subprocess.run(f'echo {shlex.quote(sys.argv[1])}', shell=True)
is
safe when using a shell that follows POSIX quoting rules:
$ cat > run_quoted.py
import sys, shlex, subprocess
subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True)
$ python3 run_quoted.py pwd
pwd
$ python3 run_quoted.py '; pwd'
; pwd
$ python3 run_quoted.py "'pwd'"
'pwd'
but remains unsafe when running a shell from Python invokes cmd.exe
(or Powershell):
S:\> echo import sys, shlex, subprocess > run_quoted.py
S:\> echo subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True) >> run_quoted.py
S:\> type run_quoted.py
import sys, shlex, subprocess
subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True)
S:\> python3 run_quoted.py "echo OK"
'echo OK'
S:\> python3 run_quoted.py "'& echo Oh no!"
''"'"'
Oh no!'
Resolving this standard library limitation is beyond the scope of this PEP.
Acknowledgements
- Eric V. Smith for creating PEP 498 and demonstrating the feasibility of arbitrary expression substitution in string interpolation
- The authors of PEP 750 for the substantial design improvements that tagged strings inspired for this PEP, their general advocacy for the value of language level delayed template rendering support, and their efforts to ensure that any native interpolation template support lays a strong foundation for future efforts in providing robust syntax highlighting and static type checking support for domain specific languages
- Barry Warsaw, Armin Ronacher, and Mike Miller for their contributions to exploring the feasibility of using this model of delayed rendering in i18n use cases (even though the ultimate conclusion was that it was a poor fit, at least for current approaches to i18n in Python)
References
- %-formatting
- str.format
- string.Template documentation
- PEP 215: String Interpolation
- PEP 292: Simpler String Substitutions
- PEP 3101: Advanced String Formatting
- PEP 498: Literal string formatting
- PEP 675: Arbitrary Literal String Type
- PEP 701: Syntactic formalization of f-strings
- FormattableString and C# native string interpolation
- IFormattable interface in C# (see remarks for globalization notes)
- TemplateLiterals in Javascript
- Running external commands in Julia
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Source: https://github.com/python/peps/blob/main/peps/pep-0501.rst
Last modified: 2024-10-19 14:00:43 GMT