PEP: 441 Title: Improving Python ZIP Application Support Author: Daniel
Holth <dholth@gmail.com>, Paul Moore <p.f.moore@gmail.com>
Discussions-To:
https://mail.python.org/pipermail/python-dev/2015-February/138277.html
Status: Final Type: Standards Track Content-Type: text/x-rst Created:
30-Mar-2013 Python-Version: 3.5 Post-History: 30-Mar-2013, 01-Apr-2013,
16-Feb-2015 Resolution:
https://mail.python.org/pipermail/python-dev/2015-February/138578.html

Improving Python ZIP Application Support

Python has had the ability to execute directories or ZIP-format archives
as scripts since version 2.6[1]. When invoked with a zip file or
directory as its first argument the interpreter adds that directory to
sys.path and executes the __main__ module. These archives provide a
great way to publish software that needs to be distributed as a single
file script but is complex enough to need to be written as a collection
of modules.

This feature is not as popular as it should be mainly because it was not
promoted as part of Python 2.6[2], so that it is relatively unknown, but
also because the Windows installer does not register a file extension
(other than .py) for this format of file, to associate with the
launcher.

This PEP proposes to fix these problems by re-publicising the feature,
defining the .pyz and .pyzw extensions as "Python ZIP Applications" and
"Windowed Python ZIP Applications", and providing some simple tooling to
manage the format.

A New Python ZIP Application Extension

The terminology "Python Zip Application" will be the formal term used
for a zip-format archive that contains Python code in a form that can be
directly executed by Python (specifically, it must have a __main__.py
file in the root directory of the archive). The extension .pyz will be
formally associated with such files.

The Python 3.5 installer will associate .pyz and .pyzw "Python Zip
Applications" with the platform launcher so they can be executed. A .pyz
archive is a console application and a .pyzw archive is a windowed
application, indicating whether the console should appear when running
the app.

On Unix, it would be ideal if the .pyz extension and the name "Python
Zip Application" were registered (in the mime types database?). However,
such an association is out of scope for this PEP.

Python Zip applications can be prefixed with a #! line pointing to the
correct Python interpreter and an optional explanation:

    #!/usr/bin/env python3
    #  Python application packed with zipapp module
    (binary contents of archive)

On Unix, this allows the OS to run the file with the correct
interpreter, via the standard "shebang" support. On Windows, the Python
launcher implements shebang support.

However, it is always possible to execute a .pyz application by
supplying the filename to the Python interpreter directly.

As background, ZIP archives are defined with a footer containing
relative offsets from the end of the file. They remain valid when
concatenated to the end of any other file. This feature is completely
standard and is how self-extracting ZIP archives and the bdist_wininst
installer format work.

Minimal Tooling: The zipapp Module

This PEP also proposes including a module for working with these
archives. The module will contain functions for working with Python zip
application archives, and a command line interface (via
python -m zipapp) for their creation and manipulation.

More complete tools for managing Python Zip Applications are encouraged
as 3rd party applications on PyPI. Currently, pyzzer[3] and pex[4] are
two such tools.

Module Interface

The zipapp module will provide the following functions:

create_archive(source, target=None, interpreter=None, main=None)

Create an application archive from source. The source can be any of the
following:

-   The name of a directory, in which case a new application archive
    will be created from the content of that directory.
-   The name of an existing application archive file, in which case the
    file is copied to the target. The file name should include the .pyz
    or .pyzw extension, if required.
-   A file object open for reading in bytes mode. The content of the
    file should be an application archive, and the file object is
    assumed to be positioned at the start of the archive.

The target argument determines where the resulting archive will be
written:

-   If it is the name of a file, the archive will be written to that
    file.
-   If it is an open file object, the archive will be written to that
    file object, which must be open for writing in bytes mode.
-   If the target is omitted (or None), the source must be a directory
    and the target will be a file with the same name as the source, with
    a .pyz extension added.

The interpreter argument specifies the name of the Python interpreter
with which the archive will be executed. It is written as a "shebang"
line at the start of the archive. On Unix, this will be interpreted by
the OS, and on Windows it will be handled by the Python launcher.
Omitting the interpreter results in no shebang line being written. If an
interpreter is specified, and the target is a filename, the executable
bit of the target file will be set.

The main argument specifies the name of a callable which will be used as
the main program for the archive. It can only be specified if the source
is a directory, and the source does not already contain a __main__.py
file. The main argument should take the form "pkg.module:callable" and
the archive will be run by importing "pkg.module" and executing the
given callable with no arguments. It is an error to omit main if the
source is a directory and does not contain a __main__.py file, as
otherwise the resulting archive would not be executable.

If a file object is specified for source or target, it is the caller's
responsibility to close it after calling create_archive.

When copying an existing archive, file objects supplied only need read
and readline, or write methods. When creating an archive from a
directory, if the target is a file object it will be passed to the
zipfile.ZipFile class, and must supply the methods needed by that class.

get_interpreter(archive)

Returns the interpreter specified in the shebang line of the archive. If
there is no shebang, the function returns None. The archive argument can
be a filename or a file-like object open for reading in bytes mode.

Command Line Usage

The zipapp module can be run with the python -m flag. The command line
interface is as follows:

    python -m zipapp directory [options]

        Create an archive from the given directory.  An archive will
        be created from the contents of that directory.  The archive
        will have the same name as the source directory with a .pyz
        extension.

        The following options can be specified:

        -o archive / --output archive

            The destination archive will have the specified name.  The
            given name will be used as written, so should include the
            ".pyz" or ".pyzw" extension.

        -p interpreter / --python interpreter

            The given interpreter will be written to the shebang line
            of the archive.  If this option is not given, the archive
            will have no shebang line.

        -m pkg.mod:fn / --main pkg.mod:fn

            The source directory must not have a __main__.py file. The
            archiver will write a __main__.py file into the target
            which calls fn from the module pkg.mod.

The behaviour of the command line interface matches that of
zipapp.create_archive().

In addition, it is possible to use the command line interface to work
with an existing archive:

    python -m zipapp app.pyz --show

        Displays the shebang line of an archive.  Output is of the
        form

            Interpreter: /usr/bin/env
        or
            Interpreter: <none>

        and is intended for diagnostic use, not for scripts.

    python -m zipapp app.pyz -o newapp.pyz [-p interpreter]

        Copy app.pyz to newapp.pyz, modifying the shebang line based
        on the -p option (as for creating an archive, no -p option
        means remove the shebang line).  Specifying a destination is
        mandatory.

        In-place modification of an archive is *not* supported, as the
        risk of damaging archives is too great for a simple tool.

As noted, the archives are standard zip files, and so can be unpacked
using any standard ZIP utility or Python's zipfile module. For this
reason, no interfaces to list the contents of an archive, or unpack
them, are provided or needed.

FAQ

Are you sure a standard ZIP utility can handle #! at the beginning?

    Absolutely. The zipfile specification allows for arbitrary data to
    be prepended to a zipfile. This feature is commonly used by
    "self-extracting zip" programs. If your archive program can't handle
    this, it is a bug in your archive program.

Isn't zipapp just a very thin wrapper over the zipfile module?

    Yes. If you prefer to build your own Python zip application archives
    using other tools, they will work just as well. The zipapp module is
    a convenience, nothing more.

Why not use just use a .zip or .py extension?

    Users expect a .zip file to be opened with an archive tool, and
    expect a .py file to contain readable text. Both would be confusing
    for this use case.

How does this compete with existing package formats?

    The sdist, bdist and wheel formats are designed for packaging of
    modules to be installed into an existing Python installation. They
    are not intended to be used without installing. The executable zip
    format is specifically designed for standalone use, without needing
    to be installed. They are in effect a multi-file version of a
    standalone Python script.

Rejected Proposals

Convenience Values for Shebang Lines

Is it worth having "convenience" forms for any of the common interpreter
values? For example, -p 3 meaning the same as -p "/usr/bin/env python3".
It would save a lot of typing for the common cases, as well as giving
cross-platform options for people who don't want or need to understand
the intricacies of shebang handling on "other" platforms.

Downsides are that it's not obvious how to translate the abbreviations.
For example, should "3" mean "/usr/bin/env python3", "/usr/bin/python3",
"python3", or something else? Also, there is no obvious short form for
the key case of "/usr/bin/env python" (any available version of Python),
which could easily result in scripts being written with
overly-restrictive shebang lines.

Overall, this seems like there are more problems than benefits, and as a
result has been dropped from consideration.

Registering .pyz as a Media Type

It was suggested[5] that the .pyz extension should be registered in the
Unix database of extensions. While it makes sense to do this as an
equivalent of the Windows installer registering the extension, the .py
extension is not listed in the media types database[6]. It doesn't seem
reasonable to register .pyz without .py, so this idea has been omitted
from this PEP. An interested party could arrange for both .py and .pyz
to be registered at a future date.

Default Interpreter

The initial draft of this PEP proposed using /usr/bin/env python as the
default interpreter. Unix users have problems with this behaviour, as
the default for the python command on many distributions is Python 2,
and it is felt that this PEP should prefer Python 3 by default. However,
using a command of python3 can result in unexpected behaviour for
Windows users, where the default behaviour of the launcher for the
command python is commonly customised by users, but the behaviour of
python3 may not be modified to match.

As a result, the principle "in the face of ambiguity, refuse to guess"
has been invoked, and archives have no shebang line unless explicitly
requested. On Windows, the archives will still be run (with the default
Python) by the launcher, and on Unix, the archives can be run by
explicitly invoking the desired Python interpreter.

Command Line Tool to Manage Shebang Lines

It is conceivable that users would want to modify the shebang line for
an existing archive, or even just display the current shebang line. This
is tricky to do so with existing tools (zip programs typically ignore
prepended data totally, and text editors can have trouble editing files
containing binary data).

The zipapp module provides functions to handle the shebang line, but
does not include a command line interface to that functionality. This is
because it is not clear how to provide one without the resulting
interface being over-complex and potentially confusing. Changing the
shebang line is expected to be an uncommon requirement.

Reference Implementation

A reference implementation is at http://bugs.python.org/issue23491.

References

The discussion of this PEP took place on the python-dev mailing list, in
the thread starting at
https://mail.python.org/pipermail/python-dev/2015-February/138277.html

Copyright

This document has been placed into the public domain.

[1] Allow interpreter to execute a zip file
(http://bugs.python.org/issue1739468)

[2] Feature is not documented (http://bugs.python.org/issue17359)

[3] pyzzer - A tool for creating Python-executable archives
(https://pypi.python.org/pypi/pyzzer)

[4] pex - The PEX packaging toolchain (https://pypi.python.org/pypi/pex)

[5] Discussion of adding a .pyz mime type on python-dev
(https://mail.python.org/pipermail/python-dev/2015-February/138338.html)

[6] Register of media types
(http://www.iana.org/assignments/media-types/media-types.xhtml)