PEP: 723 Title: Inline script metadata Author: Ofek Lev
<ofekmeister@gmail.com> Sponsor: Adam Turner <python@quite.org.uk>
PEP-Delegate: Brett Cannon <brett@python.org> Discussions-To:
https://discuss.python.org/t/31151 Status: Final Type: Standards Track
Topic: Packaging Created: 04-Aug-2023 Post-History: 04-Aug-2023,
06-Aug-2023, 23-Aug-2023, 06-Dec-2023, Replaces: 722 Resolution:
08-Jan-2024

packaging:inline-script-metadata

Abstract

This PEP specifies a metadata format that can be embedded in single-file
Python scripts to assist launchers, IDEs and other external tools which
may need to interact with such scripts.

Motivation

Python is routinely used as a scripting language, with Python scripts as
a (better) alternative to shell scripts, batch files, etc. When Python
code is structured as a script, it is usually stored as a single file
and does not expect the availability of any other local code that may be
used for imports. As such, it is possible to share with others over
arbitrary text-based means such as email, a URL to the script, or even a
chat window. Code that is structured like this may live as a single file
forever, never becoming a full-fledged project with its own directory
and pyproject.toml file.

An issue that users encounter with this approach is that there is no
standard mechanism to define metadata for tools whose job it is to
execute such scripts. For example, a tool that runs a script may need to
know which dependencies are required or the supported version(s) of
Python.

There is currently no standard tool that addresses this issue, and this
PEP does not attempt to define one. However, any tool that does address
this issue will need to know what the runtime requirements of scripts
are. By defining a standard format for storing such metadata, existing
tools, as well as any future tools, will be able to obtain that
information without requiring users to include tool-specific metadata in
their scripts.

Rationale

This PEP defines a mechanism for embedding metadata within the script
itself, and not in an external file.

The metadata format is designed to be similar to the layout of data in
the pyproject.toml file of a Python project directory, to provide a
familiar experience for users who have experience writing Python
projects. By using a similar format, we avoid unnecessary inconsistency
between packaging tools, a common frustration expressed by users in the
recent packaging survey.

The following are some of the use cases that this PEP wishes to support:

-   A user facing CLI that is capable of executing scripts. If we take
    Hatch as an example, the interface would be simply
    hatch run /path/to/script.py [args] and Hatch will manage the
    environment for that script. Such tools could be used as shebang
    lines on non-Windows systems e.g. #!/usr/bin/env hatch run.
-   A script that desires to transition to a directory-type project. A
    user may be rapidly prototyping locally or in a remote REPL
    environment and then decide to transition to a more formal project
    layout if their idea works out. Being able to define dependencies in
    the script would be very useful to have fully reproducible bug
    reports.
-   Users that wish to avoid manual dependency management. For example,
    package managers that have commands to add/remove dependencies or
    dependency update automation in CI that triggers based on new
    versions or in response to CVEs[1].

Specification

This PEP defines a metadata comment block format loosely inspired[2] by
reStructuredText Directives.

Any Python script may have top-level comment blocks that MUST start with
the line # /// TYPE where TYPE determines how to process the content.
That is: a single #, followed by a single space, followed by three
forward slashes, followed by a single space, followed by the type of
metadata. Block MUST end with the line # ///. That is: a single #,
followed by a single space, followed by three forward slashes. The TYPE
MUST only consist of ASCII letters, numbers and hyphens.

Every line between these two lines (# /// TYPE and # ///) MUST be a
comment starting with #. If there are characters after the # then the
first character MUST be a space. The embedded content is formed by
taking away the first two characters of each line if the second
character is a space, otherwise just the first character (which means
the line consists of only a single #).

Precedence for an ending line # /// is given when the next line is not a
valid embedded content line as described above. For example, the
following is a single fully valid block:

    # /// some-toml
    # embedded-csharp = """
    # /// <summary>
    # /// text
    # ///
    # /// </summary>
    # public class MyClass { }
    # """
    # ///

A starting line MUST NOT be placed between another starting line and its
ending line. In such cases tools MAY produce an error. Unclosed blocks
MUST be ignored.

When there are multiple comment blocks of the same TYPE defined, tools
MUST produce an error.

Tools reading embedded metadata MAY respect the standard Python encoding
declaration. If they choose not to do so, they MUST process the file as
UTF-8.

This is the canonical regular expression that MAY be used to parse the
metadata:

    (?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$

In circumstances where there is a discrepancy between the text
specification and the regular expression, the text specification takes
precedence.

Tools MUST NOT read from metadata blocks with types that have not been
standardized by this PEP or future ones.

script type

The first type of metadata block is named script which contains script
metadata (dependency data and tool configuration).

This document MAY include top-level fields dependencies and
requires-python, and MAY optionally include a [tool] table.

The [tool] table MAY be used by any tool, script runner or otherwise, to
configure behavior. It has the same semantics as the
tool table <518#tool-table> in pyproject.toml.

The top-level fields are:

-   dependencies: A list of strings that specifies the runtime
    dependencies of the script. Each entry MUST be a valid PEP 508
    dependency.
-   requires-python: A string that specifies the Python version(s) with
    which the script is compatible. The value of this field MUST be a
    valid version specifier <440#version-specifiers>.

Script runners MUST error if the specified dependencies cannot be
provided. Script runners SHOULD error if no version of Python that
satisfies the specified requires-python can be provided.

Example

The following is an example of a script with embedded metadata:

    # /// script
    # requires-python = ">=3.11"
    # dependencies = [
    #   "requests<3",
    #   "rich",
    # ]
    # ///

    import requests
    from rich.pretty import pprint

    resp = requests.get("https://peps.python.org/api/peps.json")
    data = resp.json()
    pprint([(k, v["title"]) for k, v in data.items()][:10])

Reference Implementation

The following is an example of how to read the metadata on Python 3.11
or higher.

    import re
    import tomllib

    REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$'

    def read(script: str) -> dict | None:
        name = 'script'
        matches = list(
            filter(lambda m: m.group('type') == name, re.finditer(REGEX, script))
        )
        if len(matches) > 1:
            raise ValueError(f'Multiple {name} blocks found')
        elif len(matches) == 1:
            content = ''.join(
                line[2:] if line.startswith('# ') else line[1:]
                for line in matches[0].group('content').splitlines(keepends=True)
            )
            return tomllib.loads(content)
        else:
            return None

Often tools will edit dependencies like package managers or dependency
update automation in CI. The following is a crude example of modifying
the content using the tomlkit library.

    import re

    import tomlkit

    REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$'

    def add(script: str, dependency: str) -> str:
        match = re.search(REGEX, script)
        content = ''.join(
            line[2:] if line.startswith('# ') else line[1:]
            for line in match.group('content').splitlines(keepends=True)
        )

        config = tomlkit.parse(content)
        config['dependencies'].append(dependency)
        new_content = ''.join(
            f'# {line}' if line.strip() else f'#{line}'
            for line in tomlkit.dumps(config).splitlines(keepends=True)
        )

        start, end = match.span('content')
        return script[:start] + new_content + script[end:]

Note that this example used a library that preserves TOML formatting.
This is not a requirement for editing by any means but rather is a "nice
to have" feature.

The following is an example of how to read a stream of arbitrary
metadata blocks.

    import re
    from typing import Iterator

    REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$'

    def stream(script: str) -> Iterator[tuple[str, str]]:
        for match in re.finditer(REGEX, script):
            yield match.group('type'), ''.join(
                line[2:] if line.startswith('# ') else line[1:]
                for line in match.group('content').splitlines(keepends=True)
            )

Backwards Compatibility

At the time of writing, the # /// script block comment starter does not
appear in any Python files on GitHub. Therefore, there is little risk of
existing scripts being broken by this PEP.

Security Implications

If a script containing embedded metadata is run using a tool that
automatically installs dependencies, this could cause arbitrary code to
be downloaded and installed in the user's environment.

The risk here is part of the functionality of the tool being used to run
the script, and as such should already be addressed by the tool itself.
The only additional risk introduced by this PEP is if an untrusted
script with embedded metadata is run, when a potentially malicious
dependency or transitive dependency might be installed.

This risk is addressed by the normal good practice of reviewing code
before running it. Additionally, tools may be able to provide locking
functionality to ameliorate this risk.

How to Teach This

To embed metadata in a script, define a comment block that starts with
the line # /// script and ends with the line # ///. Every line between
those two lines must be a comment and the full content is derived by
removing the first two characters.

    # /// script
    # dependencies = [
    #   "requests<3",
    #   "rich",
    # ]
    # requires-python = ">=3.11"
    # ///

The allowed fields are described in the following table:

  ----------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------
  Field             Description                                                                                                                                                              Tool behavior
  dependencies      A list of strings that specifies the runtime dependencies of the script. Each entry must be a valid PEP 508 dependency.                                                  Tools will error if the specified dependencies cannot be provided.
  requires-python   A string that specifies the Python version(s) with which the script is compatible. The value of this field must be a valid version specifier <440#version-specifiers>.   Tools might error if no version of Python that satisfies the constraint can be executed.
  ----------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------

In addition, a [tool] table is allowed. Details of what is permitted are
similar to what is permitted in pyproject.toml, but precise information
must be included in the documentation of the relevant tool.

It is up to individual tools whether or not their behavior is altered
based on the embedded metadata. For example, every script runner may not
be able to provide an environment for specific Python versions as
defined by the requires-python field.

The tool table <518#tool-table> may be used by any tool, script runner
or otherwise, to configure behavior.

Recommendations

Tools that support managing different versions of Python should attempt
to use the highest available version of Python that is compatible with
the script's requires-python metadata, if defined.

Tooling buy-in

The following is a list of tools that have expressed support for this
PEP or have committed to implementing support should it be accepted:

-   Pantsbuild and Pex: expressed support for any way to define
    dependencies and also features that this PEP considers as valid use
    cases such as building packages from scripts and embedding tool
    configuration
-   Mypy and Ruff: strongly expressed support for embedding tool
    configuration as it would solve existing pain points for users
-   Hatch: (author of this PEP) expressed support for all aspects of
    this PEP, and will be one of the first tools to support running
    scripts with specifically configured Python versions

Rejected Ideas

Why not use a comment block resembling requirements.txt?

This PEP considers there to be different types of users for whom Python
code would live as single-file scripts:

-   Non-programmers who are just using Python as a scripting language to
    achieve a specific task. These users are unlikely to be familiar
    with concepts of operating systems like shebang lines or the PATH
    environment variable. Some examples:
    -   The average person, perhaps at a workplace, who wants to write a
        script to automate something for efficiency or to reduce tedium
    -   Someone doing data science or machine learning in industry or
        academia who wants to write a script to analyze some data or for
        research purposes. These users are special in that, although
        they have limited programming knowledge, they learn from sources
        like StackOverflow and blogs that have a programming bent and
        are increasingly likely to be part of communities that share
        knowledge and code. Therefore, a non-trivial number of these
        users will have some familiarity with things like Git(Hub),
        Jupyter, HuggingFace, etc.
-   Non-programmers who manage operating systems e.g. a sysadmin. These
    users are able to set up PATH, for example, but are unlikely to be
    familiar with Python concepts like virtual environments. These users
    often operate in isolation and have limited need to gain exposure to
    tools intended for sharing like Git.
-   Programmers who manage operating systems/infrastructure e.g. SREs.
    These users are not very likely to be familiar with Python concepts
    like virtual environments, but are likely to be familiar with Git
    and most often use it to version control everything required to
    manage infrastructure like Python scripts and Kubernetes config.
-   Programmers who write scripts primarily for themselves. These users
    over time accumulate a great number of scripts in various languages
    that they use to automate their workflow and often store them in a
    single directory, that is potentially version controlled for
    persistence. Non-Windows users may set up each Python script with a
    shebang line pointing to the desired Python executable or script
    runner.

This PEP argues that the proposed TOML-based metadata format is the best
for each category of user and that the requirements-like block comment
is only approachable for those who have familiarity with
requirements.txt, which represents a small subset of users.

-   For the average person automating a task or the data scientist, they
    are already starting with zero context and are unlikely to be
    familiar with TOML nor requirements.txt. These users will very
    likely rely on snippets found online via a search engine or utilize
    AI in the form of a chat bot or direct code completion software. The
    similarity with dependency information stored in pyproject.toml will
    provide useful search results relatively quickly, and while the
    pyproject.toml format and the script metadata format are not the
    same, any resulting discrepancies are unlikely to be difficult for
    the intended users to resolve.

    Additionally, these users are most susceptible to formatting quirks
    and syntax errors. TOML is a well-defined format with existing
    online validators that features assignment that is compatible with
    Python expressions and has no strict indenting rules. The block
    comment format on the other hand could be easily malformed by
    forgetting the colon, for example, and debugging why it's not
    working with a search engine would be a difficult task for such a
    user.

-   For the sysadmin types, they are equally unlikely as the previously
    described users to be familiar with TOML or requirements.txt. For
    either format they would have to read documentation. They would
    likely be more comfortable with TOML since they are used to
    structured data formats and there would be less perceived magic in
    their systems.

    Additionally, for maintenance of their systems /// script would be
    much easier to search for from a shell than a block comment with
    potentially numerous extensions over time.

-   For the SRE types, they are likely to be familiar with TOML already
    from other projects that they might have to work with like
    configuring the GitLab Runner__ or Cloud Native Buildpacks__.

    __
    https://docs.gitlab.com/runner/configuration/advanced-configuration.html
    __ https://buildpacks.io/docs/reference/config/

    These users are responsible for the security of their systems and
    most likely have security scanners set up to automatically open PRs
    to update versions of dependencies. Such automated tools like
    Dependabot would have a much easier time using existing TOML
    libraries than writing their own custom parser for a block comment
    format.

-   For the programmer types, they are more likely to be familiar with
    TOML than they have ever seen a requirements.txt file, unless they
    are a Python programmer who has had previous experience with writing
    applications. In the case of experience with the requirements
    format, it necessarily means that they are at least somewhat
    familiar with the ecosystem and therefore it is safe to assume they
    know what TOML is.

    Another benefit of this PEP to these users is that their IDEs like
    Visual Studio Code would be able to provide TOML syntax highlighting
    much more easily than each writing custom logic for this feature.

Additionally, since the original block comment alternative format
(double #) went against the recommendation of PEP 8 and as a result
linters and IDE auto-formatters that respected the recommendation would
fail by default, the final proposal uses standard comments starting with
a single # character without any obvious start nor end sequence.

The concept of regular comments that do not appear to be intended for
machines (i.e. encoding declarations__) affecting behavior would not be
customary to users of Python and goes directly against the "explicit is
better than implicit" foundational principle.

Users typing what to them looks like prose could alter runtime behavior.
This PEP takes the view that the possibility of that happening, even
when a tool has been set up as such (maybe by a sysadmin), is unfriendly
to users.

Finally, and critically, the alternatives to this PEP like PEP 722 do
not satisfy the use cases enumerated herein, such as setting the
supported Python versions, the eventual building of scripts into
packages, and the ability to have machines edit metadata on behalf of
users. It is very likely that the requests for such features persist and
conceivable that another PEP in the future would allow for the embedding
of such metadata. At that point there would be multiple ways to achieve
the same thing which goes against our foundational principle of "there
should be one - and preferably only one -obvious way to do it".

Why not use a multi-line string?

A previous version of this PEP proposed that the metadata be stored as
follows:

    __pyproject__ = """
    ...
    """

The most significant problem with this proposal is that the embedded
TOML would be limited in the following ways:

-   It would not be possible to use multi-line double-quoted strings in
    the TOML as that would conflict with the Python string containing
    the document. Many TOML writers do not preserve style and may
    potentially produce output that would be malformed.
-   The way in which character escaping works in Python strings is not
    quite the way it works in TOML strings. It would be possible to
    preserve a one-to-one character mapping by enforcing raw strings,
    but this r prefix requirement may be potentially confusing to users.

Why not reuse core metadata fields?

A previous version of this PEP proposed to reuse the existing metadata
standard that is used to describe projects.

There are two significant problems with this proposal:

-   The name and version fields are required and changing that would
    require its own PEP

-   Reusing the data is fundamentally a misuse of it__

    __
    https://snarky.ca/differentiating-between-writing-down-dependencies-to-use-packages-and-for-packages-themselves/

Why not limit to specific metadata fields?

By limiting the metadata to just dependencies, we would prevent the
known use case of tools that support managing Python installations,
which would allows users to target specific versions of Python for new
syntax or standard library functionality.

Why not limit tool configuration?

By not allowing the [tool] table, we would prevent known functionality
that would benefit users. For example:

-   A script runner may support injecting of dependency resolution data
    for an embedded lock file (this is what Go's gorun can do).

-   A script runner may support configuration instructing to run scripts
    in containers for situations in which there is no cross-platform
    support for a dependency or if the setup is too complex for the
    average user like when requiring Nvidia drivers. Situations like
    this would allow users to proceed with what they want to do whereas
    otherwise they may stop at that point altogether.

-   Tools may wish to experiment with features to ease development
    burden for users such as the building of single-file scripts into
    packages. We received feedback stating that there are already tools
    that exist in the wild that build wheels and source distributions
    from single files.

    The author of the Rust RFC for embedding metadata mentioned to us
    that they are actively looking into that as well based on user
    feedback saying that there is unnecessary friction with managing
    small projects.

    There has been a commitment to support this by at least one major
    build system.

Why not limit tool behavior?

A previous version of this PEP proposed that non-script running tools
SHOULD NOT modify their behavior when the script is not the sole input
to the tool. For example, if a linter is invoked with the path to a
directory, it SHOULD behave the same as if zero files had embedded
metadata.

This was done as a precaution to avoid tool behavior confusion and
generating various feature requests for tools to support this PEP.
However, during discussion we received feedback from maintainers of
tools that this would be undesirable and potentially confusing to users.
Additionally, this may allow for a universally easier way to configure
tools in certain circumstances and solve existing issues.

Why not just set up a Python project with a pyproject.toml?

Again, a key issue here is that the target audience for this proposal is
people writing scripts which aren't intended for distribution. Sometimes
scripts will be "shared", but this is far more informal than
"distribution" - it typically involves sending a script via an email
with some written instructions on how to run it, or passing someone a
link to a GitHub gist.

Expecting such users to learn the complexities of Python packaging is a
significant step up in complexity, and would almost certainly give the
impression that "Python is too hard for scripts".

In addition, if the expectation here is that the pyproject.toml will
somehow be designed for running scripts in place, that's a new feature
of the standard that doesn't currently exist. At a minimum, this isn't a
reasonable suggestion until the current discussion on Discourse about
using pyproject.toml for projects that won't be distributed as wheels is
resolved. And even then, it doesn't address the "sending someone a
script in a gist or email" use case.

Why not infer the requirements from import statements?

The idea would be to automatically recognize import statements in the
source file and turn them into a list of requirements.

However, this is infeasible for several reasons. First, the points above
about the necessity to keep the syntax easily parsable, for all Python
versions, also by tools written in other languages, apply equally here.

Second, PyPI and other package repositories conforming to the Simple
Repository API do not provide a mechanism to resolve package names from
the module names that are imported (see also this related discussion__).

Third, even if repositories did offer this information, the same import
name may correspond to several packages on PyPI. One might object that
disambiguating which package is wanted would only be needed if there are
several projects providing the same import name. However, this would
make it easy for anyone to unintentionally or malevolently break working
scripts, by uploading a package to PyPI providing an import name that is
the same as an existing project. The alternative where, among the
candidates, the first package to have been registered on the index is
chosen, would be confusing in case a popular package is developed with
the same import name as an existing obscure package, and even harmful if
the existing package is malware intentionally uploaded with a
sufficiently generic import name that has a high probability of being
reused.

A related idea would be to attach the requirements as comments to the
import statements instead of gathering them in a block, with a syntax
such as:

    import numpy as np # requires: numpy
    import rich # requires: rich

This still suffers from parsing difficulties. Also, where to place the
comment in the case of multiline imports is ambiguous and may look ugly:

    from PyQt5.QtWidgets import (
        QCheckBox, QComboBox, QDialog, QDialogButtonBox,
        QGridLayout, QLabel, QSpinBox, QTextEdit
    ) # requires: PyQt5

Furthermore, this syntax cannot behave as might be intuitively expected
in all situations. Consider:

    import platform
    if platform.system() == "Windows":
        import pywin32 # requires: pywin32

Here, the user's intent is that the package is only required on Windows,
but this cannot be understood by the script runner (the correct way to
write it would be requires: pywin32 ; sys_platform == 'win32').

(Thanks to Jean Abou-Samra for the clear discussion of this point)

Why not use a requirements file for dependencies?

Putting your requirements in a requirements file, doesn't require a PEP.
You can do that right now, and in fact it's quite likely that many adhoc
solutions do this. However, without a standard, there's no way of
knowing how to locate a script's dependency data. And furthermore, the
requirements file format is pip-specific, so tools relying on it are
depending on a pip implementation detail.

So in order to make a standard, two things would be required:

1.  A standardised replacement for the requirements file format.
2.  A standard for how to locate the requirements file for a given
    script.

The first item is a significant undertaking. It has been discussed on a
number of occasions, but so far no-one has attempted to actually do it.
The most likely approach would be for standards to be developed for
individual use cases currently addressed with requirements files. One
option here would be for this PEP to simply define a new file format
which is simply a text file containing PEP 508 requirements, one per
line. That would just leave the question of how to locate that file.

The "obvious" solution here would be to do something like name the file
the same as the script, but with a .reqs extension (or something
similar). However, this still requires two files, where currently only a
single file is needed, and as such, does not match the "better batch
file" model (shell scripts and batch files are typically
self-contained). It requires the developer to remember to keep the two
files together, and this may not always be possible. For example, system
administration policies may require that all files in a certain
directory are executable (the Linux filesystem standards require this of
/usr/bin, for example). And some methods of sharing a script (for
example, publishing it on a text file sharing service like Github's
gist, or a corporate intranet) may not allow for deriving the location
of an associated requirements file from the script's location (tools
like pipx support running a script directly from a URL, so "download and
unpack a zip of the script and its dependencies" may not be an
appropriate requirement).

Essentially, though, the issue here is that there is an explicitly
stated requirement that the format supports storing dependency data in
the script file itself. Solutions that don't do that are simply ignoring
that requirement.

Why not use (possibly restricted) Python syntax?

This would typically involve storing metadata as multiple special
variables, such as the following.

    __requires_python__ = ">=3.11"
    __dependencies__ = [
        "requests",
        "click",
    ]

The most significant problem with this proposal is that it requires all
consumers of the dependency data to implement a Python parser. Even if
the syntax is restricted, the rest of the script will use the full
Python syntax, and trying to define a syntax which can be successfully
parsed in isolation from the surrounding code is likely to be extremely
difficult and error-prone.

Furthermore, Python's syntax changes in every release. If extracting
dependency data needs a Python parser, the parser will need to know
which version of Python the script is written for, and the overhead for
a generic tool of having a parser that can handle multiple versions of
Python is unsustainable.

With this approach there is the potential to clutter scripts with many
variables as new extensions get added. Additionally, intuiting which
metadata fields correspond to which variable names would cause confusion
for users.

It is worth noting, though, that the pip-run utility does implement (an
extended form of) this approach. Further discussion of the pip-run
design is available on the project's issue tracker.

What about local dependencies?

These can be handled without needing special metadata and tooling,
simply by adding the location of the dependencies to sys.path. This PEP
simply isn't needed for this case. If, on the other hand, the "local
dependencies" are actual distributions which are published locally, they
can be specified as usual with a PEP 508 requirement, and the local
package index specified when running a tool by using the tool's UI for
that.

Open Issues

None at this point.

References

Footnotes

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

[1] A large number of users use scripts that are version controlled. For
example, the SREs that were mentioned or projects that require special
maintenance like the AWS CLI or Calibre.

[2] The syntax is taken directly from the final resolution of the Blocks
extension to Python Markdown.

__ https://github.com/facelessuser/pymdown-extensions/discussions/1973
__ https://github.com/Python-Markdown/markdown