Following system colour scheme Selected dark colour scheme Selected light colour scheme

Python Enhancement Proposals

PEP 723 – Inline script metadata

Author:
Ofek Lev <ofekmeister at gmail.com>
Sponsor:
Adam Turner <python at quite.org.uk>
PEP-Delegate:
Brett Cannon <brett at python.org>
Discussions-To:
Discourse thread
Status:
Accepted
Type:
Standards Track
Topic:
Packaging
Created:
04-Aug-2023
Post-History:
04-Aug-2023, 06-Aug-2023, 23-Aug-2023, 06-Dec-2023
Replaces:
722
Resolution:
Discourse message

Table of Contents

Note

This PEP will be declared final once it has been specified at https://packaging.python.org and implemented in a couple of tools, ideally including pip-run and pipx.

Abstract

This PEP specifies a metadata format that can be embedded in single-file Python scripts to assist launchers, IDEs and other external tools which may need to interact with such scripts.

Motivation

Python is routinely used as a scripting language, with Python scripts as a (better) alternative to shell scripts, batch files, etc. When Python code is structured as a script, it is usually stored as a single file and does not expect the availability of any other local code that may be used for imports. As such, it is possible to share with others over arbitrary text-based means such as email, a URL to the script, or even a chat window. Code that is structured like this may live as a single file forever, never becoming a full-fledged project with its own directory and pyproject.toml file.

An issue that users encounter with this approach is that there is no standard mechanism to define metadata for tools whose job it is to execute such scripts. For example, a tool that runs a script may need to know which dependencies are required or the supported version(s) of Python.

There is currently no standard tool that addresses this issue, and this PEP does not attempt to define one. However, any tool that does address this issue will need to know what the runtime requirements of scripts are. By defining a standard format for storing such metadata, existing tools, as well as any future tools, will be able to obtain that information without requiring users to include tool-specific metadata in their scripts.

Rationale

This PEP defines a mechanism for embedding metadata within the script itself, and not in an external file.

The metadata format is designed to be similar to the layout of data in the pyproject.toml file of a Python project directory, to provide a familiar experience for users who have experience writing Python projects. By using a similar format, we avoid unnecessary inconsistency between packaging tools, a common frustration expressed by users in the recent packaging survey.

The following are some of the use cases that this PEP wishes to support:

  • A user facing CLI that is capable of executing scripts. If we take Hatch as an example, the interface would be simply hatch run /path/to/script.py [args] and Hatch will manage the environment for that script. Such tools could be used as shebang lines on non-Windows systems e.g. #!/usr/bin/env hatch run.
  • A script that desires to transition to a directory-type project. A user may be rapidly prototyping locally or in a remote REPL environment and then decide to transition to a more formal project layout if their idea works out. Being able to define dependencies in the script would be very useful to have fully reproducible bug reports.
  • Users that wish to avoid manual dependency management. For example, package managers that have commands to add/remove dependencies or dependency update automation in CI that triggers based on new versions or in response to CVEs [1].

Specification

This PEP defines a metadata comment block format loosely inspired [2] by reStructuredText Directives.

Any Python script may have top-level comment blocks that MUST start with the line # /// TYPE where TYPE determines how to process the content. That is: a single #, followed by a single space, followed by three forward slashes, followed by a single space, followed by the type of metadata. Block MUST end with the line # ///. That is: a single #, followed by a single space, followed by three forward slashes. The TYPE MUST only consist of ASCII letters, numbers and hyphens.

Every line between these two lines (# /// TYPE and # ///) MUST be a comment starting with #. If there are characters after the # then the first character MUST be a space. The embedded content is formed by taking away the first two characters of each line if the second character is a space, otherwise just the first character (which means the line consists of only a single #).

Precedence for an ending line # /// is given when the next line is not a valid embedded content line as described above. For example, the following is a single fully valid block:

# /// some-toml
# embedded-csharp = """
# /// <summary>
# /// text
# ///
# /// </summary>
# public class MyClass { }
# """
# ///

A starting line MUST NOT be placed between another starting line and its ending line. In such cases tools MAY produce an error. Unclosed blocks MUST be ignored.

When there are multiple comment blocks of the same TYPE defined, tools MUST produce an error.

Tools reading embedded metadata MAY respect the standard Python encoding declaration. If they choose not to do so, they MUST process the file as UTF-8.

This is the canonical regular expression that MAY be used to parse the metadata:

(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$

In circumstances where there is a discrepancy between the text specification and the regular expression, the text specification takes precedence.

Tools MUST NOT read from metadata blocks with types that have not been standardized by this PEP or future ones.

script type

The first type of metadata block is named script which contains script metadata (dependency data and tool configuration).

This document MAY include top-level fields dependencies and requires-python, and MAY optionally include a [tool] table.

The [tool] table MAY be used by any tool, script runner or otherwise, to configure behavior. It has the same semantics as the tool table in pyproject.toml.

The top-level fields are:

  • dependencies: A list of strings that specifies the runtime dependencies of the script. Each entry MUST be a valid PEP 508 dependency.
  • requires-python: A string that specifies the Python version(s) with which the script is compatible. The value of this field MUST be a valid version specifier.

Script runners MUST error if the specified dependencies cannot be provided. Script runners SHOULD error if no version of Python that satisfies the specified requires-python can be provided.

Example

The following is an example of a script with embedded metadata:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "requests<3",
#   "rich",
# ]
# ///

import requests
from rich.pretty import pprint

resp = requests.get("https://peps.python.org/api/peps.json")
data = resp.json()
pprint([(k, v["title"]) for k, v in data.items()][:10])

Reference Implementation

The following is an example of how to read the metadata on Python 3.11 or higher.

import re
import tomllib

REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$'

def read(script: str) -> dict | None:
    name = 'script'
    matches = list(
        filter(lambda m: m.group('type') == name, re.finditer(REGEX, script))
    )
    if len(matches) > 1:
        raise ValueError(f'Multiple {name} blocks found')
    elif len(matches) == 1:
        content = ''.join(
            line[2:] if line.startswith('# ') else line[1:]
            for line in matches[0].group('content').splitlines(keepends=True)
        )
        return tomllib.loads(content)
    else:
        return None

Often tools will edit dependencies like package managers or dependency update automation in CI. The following is a crude example of modifying the content using the tomlkit library.

import re

import tomlkit

REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$'

def add(script: str, dependency: str) -> str:
    match = re.search(REGEX, script)
    content = ''.join(
        line[2:] if line.startswith('# ') else line[1:]
        for line in match.group('content').splitlines(keepends=True)
    )

    config = tomlkit.parse(content)
    config['dependencies'].append(dependency)
    new_content = ''.join(
        f'# {line}' if line.strip() else f'#{line}'
        for line in tomlkit.dumps(config).splitlines(keepends=True)
    )

    start, end = match.span('content')
    return script[:start] + new_content + script[end:]

Note that this example used a library that preserves TOML formatting. This is not a requirement for editing by any means but rather is a “nice to have” feature.

The following is an example of how to read a stream of arbitrary metadata blocks.

import re
from typing import Iterator

REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$'

def stream(script: str) -> Iterator[tuple[str, str]]:
    for match in re.finditer(REGEX, script):
        yield match.group('type'), ''.join(
            line[2:] if line.startswith('# ') else line[1:]
            for line in match.group('content').splitlines(keepends=True)
        )

Backwards Compatibility

At the time of writing, the # /// script block comment starter does not appear in any Python files on GitHub. Therefore, there is little risk of existing scripts being broken by this PEP.

Security Implications

If a script containing embedded metadata is run using a tool that automatically installs dependencies, this could cause arbitrary code to be downloaded and installed in the user’s environment.

The risk here is part of the functionality of the tool being used to run the script, and as such should already be addressed by the tool itself. The only additional risk introduced by this PEP is if an untrusted script with embedded metadata is run, when a potentially malicious dependency or transitive dependency might be installed.

This risk is addressed by the normal good practice of reviewing code before running it. Additionally, tools may be able to provide locking functionality to ameliorate this risk.

How to Teach This

To embed metadata in a script, define a comment block that starts with the line # /// script and ends with the line # ///. Every line between those two lines must be a comment and the full content is derived by removing the first two characters.

# /// script
# dependencies = [
#   "requests<3",
#   "rich",
# ]
# requires-python = ">=3.11"
# ///

The allowed fields are described in the following table:

Field Description Tool behavior
dependencies A list of strings that specifies the runtime dependencies of the script. Each entry must be a valid PEP 508 dependency. Tools will error if the specified dependencies cannot be provided.
requires-python A string that specifies the Python version(s) with which the script is compatible. The value of this field must be a valid version specifier. Tools might error if no version of Python that satisfies the constraint can be executed.

In addition, a [tool] table is allowed. Details of what is permitted are similar to what is permitted in pyproject.toml, but precise information must be included in the documentation of the relevant tool.

It is up to individual tools whether or not their behavior is altered based on the embedded metadata. For example, every script runner may not be able to provide an environment for specific Python versions as defined by the requires-python field.

The tool table may be used by any tool, script runner or otherwise, to configure behavior.

Recommendations

Tools that support managing different versions of Python should attempt to use the highest available version of Python that is compatible with the script’s requires-python metadata, if defined.

Tooling buy-in

The following is a list of tools that have expressed support for this PEP or have committed to implementing support should it be accepted:

  • Pantsbuild and Pex: expressed support for any way to define dependencies and also features that this PEP considers as valid use cases such as building packages from scripts and embedding tool configuration
  • Mypy and Ruff: strongly expressed support for embedding tool configuration as it would solve existing pain points for users
  • Hatch: (author of this PEP) expressed support for all aspects of this PEP, and will be one of the first tools to support running scripts with specifically configured Python versions

Rejected Ideas

Why not use a comment block resembling requirements.txt?

This PEP considers there to be different types of users for whom Python code would live as single-file scripts:

  • Non-programmers who are just using Python as a scripting language to achieve a specific task. These users are unlikely to be familiar with concepts of operating systems like shebang lines or the PATH environment variable. Some examples:
    • The average person, perhaps at a workplace, who wants to write a script to automate something for efficiency or to reduce tedium
    • Someone doing data science or machine learning in industry or academia who wants to write a script to analyze some data or for research purposes. These users are special in that, although they have limited programming knowledge, they learn from sources like StackOverflow and blogs that have a programming bent and are increasingly likely to be part of communities that share knowledge and code. Therefore, a non-trivial number of these users will have some familiarity with things like Git(Hub), Jupyter, HuggingFace, etc.
  • Non-programmers who manage operating systems e.g. a sysadmin. These users are able to set up PATH, for example, but are unlikely to be familiar with Python concepts like virtual environments. These users often operate in isolation and have limited need to gain exposure to tools intended for sharing like Git.
  • Programmers who manage operating systems/infrastructure e.g. SREs. These users are not very likely to be familiar with Python concepts like virtual environments, but are likely to be familiar with Git and most often use it to version control everything required to manage infrastructure like Python scripts and Kubernetes config.
  • Programmers who write scripts primarily for themselves. These users over time accumulate a great number of scripts in various languages that they use to automate their workflow and often store them in a single directory, that is potentially version controlled for persistence. Non-Windows users may set up each Python script with a shebang line pointing to the desired Python executable or script runner.

This PEP argues that the proposed TOML-based metadata format is the best for each category of user and that the requirements-like block comment is only approachable for those who have familiarity with requirements.txt, which represents a small subset of users.

  • For the average person automating a task or the data scientist, they are already starting with zero context and are unlikely to be familiar with TOML nor requirements.txt. These users will very likely rely on snippets found online via a search engine or utilize AI in the form of a chat bot or direct code completion software. The similarity with dependency information stored in pyproject.toml will provide useful search results relatively quickly, and while the pyproject.toml format and the script metadata format are not the same, any resulting discrepancies are unlikely to be difficult for the intended users to resolve.

    Additionally, these users are most susceptible to formatting quirks and syntax errors. TOML is a well-defined format with existing online validators that features assignment that is compatible with Python expressions and has no strict indenting rules. The block comment format on the other hand could be easily malformed by forgetting the colon, for example, and debugging why it’s not working with a search engine would be a difficult task for such a user.

  • For the sysadmin types, they are equally unlikely as the previously described users to be familiar with TOML or requirements.txt. For either format they would have to read documentation. They would likely be more comfortable with TOML since they are used to structured data formats and there would be less perceived magic in their systems.

    Additionally, for maintenance of their systems /// script would be much easier to search for from a shell than a block comment with potentially numerous extensions over time.

  • For the SRE types, they are likely to be familiar with TOML already from other projects that they might have to work with like configuring the GitLab Runner or Cloud Native Buildpacks.

    These users are responsible for the security of their systems and most likely have security scanners set up to automatically open PRs to update versions of dependencies. Such automated tools like Dependabot would have a much easier time using existing TOML libraries than writing their own custom parser for a block comment format.

  • For the programmer types, they are more likely to be familiar with TOML than they have ever seen a requirements.txt file, unless they are a Python programmer who has had previous experience with writing applications. In the case of experience with the requirements format, it necessarily means that they are at least somewhat familiar with the ecosystem and therefore it is safe to assume they know what TOML is.

    Another benefit of this PEP to these users is that their IDEs like Visual Studio Code would be able to provide TOML syntax highlighting much more easily than each writing custom logic for this feature.

Additionally, since the original block comment alternative format (double #) went against the recommendation of PEP 8 and as a result linters and IDE auto-formatters that respected the recommendation would fail by default, the final proposal uses standard comments starting with a single # character without any obvious start nor end sequence.

The concept of regular comments that do not appear to be intended for machines (i.e. encoding declarations) affecting behavior would not be customary to users of Python and goes directly against the “explicit is better than implicit” foundational principle.

Users typing what to them looks like prose could alter runtime behavior. This PEP takes the view that the possibility of that happening, even when a tool has been set up as such (maybe by a sysadmin), is unfriendly to users.

Finally, and critically, the alternatives to this PEP like PEP 722 do not satisfy the use cases enumerated herein, such as setting the supported Python versions, the eventual building of scripts into packages, and the ability to have machines edit metadata on behalf of users. It is very likely that the requests for such features persist and conceivable that another PEP in the future would allow for the embedding of such metadata. At that point there would be multiple ways to achieve the same thing which goes against our foundational principle of “there should be one - and preferably only one - obvious way to do it”.

Why not use a multi-line string?

A previous version of this PEP proposed that the metadata be stored as follows:

__pyproject__ = """
...
"""

The most significant problem with this proposal is that the embedded TOML would be limited in the following ways:

  • It would not be possible to use multi-line double-quoted strings in the TOML as that would conflict with the Python string containing the document. Many TOML writers do not preserve style and may potentially produce output that would be malformed.
  • The way in which character escaping works in Python strings is not quite the way it works in TOML strings. It would be possible to preserve a one-to-one character mapping by enforcing raw strings, but this r prefix requirement may be potentially confusing to users.

Why not reuse core metadata fields?

A previous version of this PEP proposed to reuse the existing metadata standard that is used to describe projects.

There are two significant problems with this proposal:

Why not limit to specific metadata fields?

By limiting the metadata to just dependencies, we would prevent the known use case of tools that support managing Python installations, which would allows users to target specific versions of Python for new syntax or standard library functionality.

Why not limit tool configuration?

By not allowing the [tool] table, we would prevent known functionality that would benefit users. For example:

  • A script runner may support injecting of dependency resolution data for an embedded lock file (this is what Go’s gorun can do).
  • A script runner may support configuration instructing to run scripts in containers for situations in which there is no cross-platform support for a dependency or if the setup is too complex for the average user like when requiring Nvidia drivers. Situations like this would allow users to proceed with what they want to do whereas otherwise they may stop at that point altogether.
  • Tools may wish to experiment with features to ease development burden for users such as the building of single-file scripts into packages. We received feedback stating that there are already tools that exist in the wild that build wheels and source distributions from single files.

    The author of the Rust RFC for embedding metadata mentioned to us that they are actively looking into that as well based on user feedback saying that there is unnecessary friction with managing small projects.

    There has been a commitment to support this by at least one major build system.

Why not limit tool behavior?

A previous version of this PEP proposed that non-script running tools SHOULD NOT modify their behavior when the script is not the sole input to the tool. For example, if a linter is invoked with the path to a directory, it SHOULD behave the same as if zero files had embedded metadata.

This was done as a precaution to avoid tool behavior confusion and generating various feature requests for tools to support this PEP. However, during discussion we received feedback from maintainers of tools that this would be undesirable and potentially confusing to users. Additionally, this may allow for a universally easier way to configure tools in certain circumstances and solve existing issues.

Why not just set up a Python project with a pyproject.toml?

Again, a key issue here is that the target audience for this proposal is people writing scripts which aren’t intended for distribution. Sometimes scripts will be “shared”, but this is far more informal than “distribution” - it typically involves sending a script via an email with some written instructions on how to run it, or passing someone a link to a GitHub gist.

Expecting such users to learn the complexities of Python packaging is a significant step up in complexity, and would almost certainly give the impression that “Python is too hard for scripts”.

In addition, if the expectation here is that the pyproject.toml will somehow be designed for running scripts in place, that’s a new feature of the standard that doesn’t currently exist. At a minimum, this isn’t a reasonable suggestion until the current discussion on Discourse about using pyproject.toml for projects that won’t be distributed as wheels is resolved. And even then, it doesn’t address the “sending someone a script in a gist or email” use case.

Why not infer the requirements from import statements?

The idea would be to automatically recognize import statements in the source file and turn them into a list of requirements.

However, this is infeasible for several reasons. First, the points above about the necessity to keep the syntax easily parsable, for all Python versions, also by tools written in other languages, apply equally here.

Second, PyPI and other package repositories conforming to the Simple Repository API do not provide a mechanism to resolve package names from the module names that are imported (see also this related discussion).

Third, even if repositories did offer this information, the same import name may correspond to several packages on PyPI. One might object that disambiguating which package is wanted would only be needed if there are several projects providing the same import name. However, this would make it easy for anyone to unintentionally or malevolently break working scripts, by uploading a package to PyPI providing an import name that is the same as an existing project. The alternative where, among the candidates, the first package to have been registered on the index is chosen, would be confusing in case a popular package is developed with the same import name as an existing obscure package, and even harmful if the existing package is malware intentionally uploaded with a sufficiently generic import name that has a high probability of being reused.

A related idea would be to attach the requirements as comments to the import statements instead of gathering them in a block, with a syntax such as:

import numpy as np # requires: numpy
import rich # requires: rich

This still suffers from parsing difficulties. Also, where to place the comment in the case of multiline imports is ambiguous and may look ugly:

from PyQt5.QtWidgets import (
    QCheckBox, QComboBox, QDialog, QDialogButtonBox,
    QGridLayout, QLabel, QSpinBox, QTextEdit
) # requires: PyQt5

Furthermore, this syntax cannot behave as might be intuitively expected in all situations. Consider:

import platform
if platform.system() == "Windows":
    import pywin32 # requires: pywin32

Here, the user’s intent is that the package is only required on Windows, but this cannot be understood by the script runner (the correct way to write it would be requires: pywin32 ; sys_platform == 'win32').

(Thanks to Jean Abou-Samra for the clear discussion of this point)

Why not use a requirements file for dependencies?

Putting your requirements in a requirements file, doesn’t require a PEP. You can do that right now, and in fact it’s quite likely that many adhoc solutions do this. However, without a standard, there’s no way of knowing how to locate a script’s dependency data. And furthermore, the requirements file format is pip-specific, so tools relying on it are depending on a pip implementation detail.

So in order to make a standard, two things would be required:

  1. A standardised replacement for the requirements file format.
  2. A standard for how to locate the requirements file for a given script.

The first item is a significant undertaking. It has been discussed on a number of occasions, but so far no-one has attempted to actually do it. The most likely approach would be for standards to be developed for individual use cases currently addressed with requirements files. One option here would be for this PEP to simply define a new file format which is simply a text file containing PEP 508 requirements, one per line. That would just leave the question of how to locate that file.

The “obvious” solution here would be to do something like name the file the same as the script, but with a .reqs extension (or something similar). However, this still requires two files, where currently only a single file is needed, and as such, does not match the “better batch file” model (shell scripts and batch files are typically self-contained). It requires the developer to remember to keep the two files together, and this may not always be possible. For example, system administration policies may require that all files in a certain directory are executable (the Linux filesystem standards require this of /usr/bin, for example). And some methods of sharing a script (for example, publishing it on a text file sharing service like Github’s gist, or a corporate intranet) may not allow for deriving the location of an associated requirements file from the script’s location (tools like pipx support running a script directly from a URL, so “download and unpack a zip of the script and its dependencies” may not be an appropriate requirement).

Essentially, though, the issue here is that there is an explicitly stated requirement that the format supports storing dependency data in the script file itself. Solutions that don’t do that are simply ignoring that requirement.

Why not use (possibly restricted) Python syntax?

This would typically involve storing metadata as multiple special variables, such as the following.

__requires_python__ = ">=3.11"
__dependencies__ = [
    "requests",
    "click",
]

The most significant problem with this proposal is that it requires all consumers of the dependency data to implement a Python parser. Even if the syntax is restricted, the rest of the script will use the full Python syntax, and trying to define a syntax which can be successfully parsed in isolation from the surrounding code is likely to be extremely difficult and error-prone.

Furthermore, Python’s syntax changes in every release. If extracting dependency data needs a Python parser, the parser will need to know which version of Python the script is written for, and the overhead for a generic tool of having a parser that can handle multiple versions of Python is unsustainable.

With this approach there is the potential to clutter scripts with many variables as new extensions get added. Additionally, intuiting which metadata fields correspond to which variable names would cause confusion for users.

It is worth noting, though, that the pip-run utility does implement (an extended form of) this approach. Further discussion of the pip-run design is available on the project’s issue tracker.

What about local dependencies?

These can be handled without needing special metadata and tooling, simply by adding the location of the dependencies to sys.path. This PEP simply isn’t needed for this case. If, on the other hand, the “local dependencies” are actual distributions which are published locally, they can be specified as usual with a PEP 508 requirement, and the local package index specified when running a tool by using the tool’s UI for that.

Open Issues

None at this point.

Footnotes


Source: https://github.com/python/peps/blob/main/peps/pep-0723.rst

Last modified: 2024-01-10 01:00:01 GMT