PEP: 324 Title: subprocess - New process module Version: $Revision$
Last-Modified: $Date$ Author: Peter Astrand <astrand@lysator.liu.se>
Status: Final Type: Standards Track Content-Type: text/x-rst Created:
19-Nov-2003 Python-Version: 2.4 Post-History:

Abstract

This PEP describes a new module for starting and communicating with
processes.

Motivation

Starting new processes is a common task in any programming language, and
very common in a high-level language like Python. Good support for this
task is needed, because:

-   Inappropriate functions for starting processes could mean a security
    risk: If the program is started through the shell, and the arguments
    contain shell meta characters, the result can be disastrous.[1]
-   It makes Python an even better replacement language for
    over-complicated shell scripts.

Currently, Python has a large number of different functions for process
creation. This makes it hard for developers to choose.

The subprocess module provides the following enhancements over previous
functions:

-   One "unified" module provides all functionality from previous
    functions.
-   Cross-process exceptions: Exceptions happening in the child before
    the new process has started to execute are re-raised in the parent.
    This means that it's easy to handle exec() failures, for example.
    With popen2, for example, it's impossible to detect if the execution
    failed.
-   A hook for executing custom code between fork and exec. This can be
    used for, for example, changing uid.
-   No implicit call of /bin/sh. This means that there is no need for
    escaping dangerous shell meta characters.
-   All combinations of file descriptor redirection is possible. For
    example, the "python-dialog"[2] needs to spawn a process and
    redirect stderr, but not stdout. This is not possible with current
    functions, without using temporary files.
-   With the subprocess module, it's possible to control if all open
    file descriptors should be closed before the new program is
    executed.
-   Support for connecting several subprocesses (shell "pipe").
-   Universal newline support.
-   A communicate() method, which makes it easy to send stdin data and
    read stdout and stderr data, without risking deadlocks. Most people
    are aware of the flow control issues involved with child process
    communication, but not all have the patience or skills to write a
    fully correct and deadlock-free select loop. This means that many
    Python applications contain race conditions. A communicate() method
    in the standard library solves this problem.

Rationale

The following points summarizes the design:

-   subprocess was based on popen2, which is tried-and-tested.

-   The factory functions in popen2 have been removed, because I
    consider the class constructor equally easy to work with.

-   popen2 contains several factory functions and classes for different
    combinations of redirection. subprocess, however, contains one
    single class. Since the subprocess module supports 12 different
    combinations of redirection, providing a class or function for each
    of them would be cumbersome and not very intuitive. Even with
    popen2, this is a readability problem. For example, many people
    cannot tell the difference between popen2.popen2 and popen2.popen4
    without using the documentation.

-   One small utility function is provided: subprocess.call(). It aims
    to be an enhancement over os.system(), while still very easy to use:

    -   It does not use the Standard C function system(), which has
        limitations.
    -   It does not call the shell implicitly.
    -   No need for quoting; using an argument list.
    -   The return value is easier to work with.

    The call() utility function accepts an 'args' argument, just like
    the Popen class constructor. It waits for the command to complete,
    then returns the returncode attribute. The implementation is very
    simple:

        def call(*args, **kwargs):
            return Popen(*args, **kwargs).wait()

    The motivation behind the call() function is simple: Starting a
    process and wait for it to finish is a common task.

    While Popen supports a wide range of options, many users have simple
    needs. Many people are using os.system() today, mainly because it
    provides a simple interface. Consider this example:

        os.system("stty sane -F " + device)

    With subprocess.call(), this would look like:

        subprocess.call(["stty", "sane", "-F", device])

    or, if executing through the shell:

        subprocess.call("stty sane -F " + device, shell=True)

-   The "preexec" functionality makes it possible to run arbitrary code
    between fork and exec. One might ask why there are special arguments
    for setting the environment and current directory, but not for, for
    example, setting the uid. The answer is:

    -   Changing environment and working directory is considered fairly
        common.
    -   Old functions like spawn() has support for an "env"-argument.
    -   env and cwd are considered quite cross-platform: They make sense
        even on Windows.

-   On POSIX platforms, no extension module is required: the module uses
    os.fork(), os.execvp() etc.

-   On Windows platforms, the module requires either Mark Hammond's
    Windows extensions[3], or a small extension module called
    _subprocess.

Specification

This module defines one class called Popen:

    class Popen(args, bufsize=0, executable=None,
                stdin=None, stdout=None, stderr=None,
                preexec_fn=None, close_fds=False, shell=False,
                cwd=None, env=None, universal_newlines=False,
                startupinfo=None, creationflags=0):

Arguments are:

-   args should be a string, or a sequence of program arguments. The
    program to execute is normally the first item in the args sequence
    or string, but can be explicitly set by using the executable
    argument.

    On UNIX, with shell=False (default): In this case, the Popen class
    uses os.execvp() to execute the child program. args should normally
    be a sequence. A string will be treated as a sequence with the
    string as the only item (the program to execute).

    On UNIX, with shell=True: If args is a string, it specifies the
    command string to execute through the shell. If args is a sequence,
    the first item specifies the command string, and any additional
    items will be treated as additional shell arguments.

    On Windows: the Popen class uses CreateProcess() to execute the
    child program, which operates on strings. If args is a sequence, it
    will be converted to a string using the list2cmdline method. Please
    note that not all MS Windows applications interpret the command line
    the same way: The list2cmdline is designed for applications using
    the same rules as the MS C runtime.

-   bufsize, if given, has the same meaning as the corresponding
    argument to the built-in open() function: 0 means unbuffered, 1
    means line buffered, any other positive value means use a buffer of
    (approximately) that size. A negative bufsize means to use the
    system default, which usually means fully buffered. The default
    value for bufsize is 0 (unbuffered).

-   stdin, stdout and stderr specify the executed programs' standard
    input, standard output and standard error file handles,
    respectively. Valid values are PIPE, an existing file descriptor (a
    positive integer), an existing file object, and None. PIPE indicates
    that a new pipe to the child should be created. With None, no
    redirection will occur; the child's file handles will be inherited
    from the parent. Additionally, stderr can be STDOUT, which indicates
    that the stderr data from the applications should be captured into
    the same file handle as for stdout.

-   If preexec_fn is set to a callable object, this object will be
    called in the child process just before the child is executed.

-   If close_fds is true, all file descriptors except 0, 1 and 2 will be
    closed before the child process is executed.

-   If shell is true, the specified command will be executed through the
    shell.

-   If cwd is not None, the current directory will be changed to cwd
    before the child is executed.

-   If env is not None, it defines the environment variables for the new
    process.

-   If universal_newlines is true, the file objects stdout and stderr
    are opened as a text file, but lines may be terminated by any of \n,
    the Unix end-of-line convention, \r, the Macintosh convention or
    \r\n, the Windows convention. All of these external representations
    are seen as \n by the Python program. Note: This feature is only
    available if Python is built with universal newline support (the
    default). Also, the newlines attribute of the file objects stdout,
    stdin and stderr are not updated by the communicate() method.

-   The startupinfo and creationflags, if given, will be passed to the
    underlying CreateProcess() function. They can specify things such as
    appearance of the main window and priority for the new process.
    (Windows only)

This module also defines two shortcut functions:

-   

    call(*args, **kwargs):

        Run command with arguments. Wait for command to complete, then
        return the returncode attribute.

        The arguments are the same as for the Popen constructor.
        Example:

            retcode = call(["ls", "-l"])

Exceptions

Exceptions raised in the child process, before the new program has
started to execute, will be re-raised in the parent. Additionally, the
exception object will have one extra attribute called 'child_traceback',
which is a string containing traceback information from the child's
point of view.

The most common exception raised is OSError. This occurs, for example,
when trying to execute a non-existent file. Applications should prepare
for OSErrors.

A ValueError will be raised if Popen is called with invalid arguments.

Security

Unlike some other popen functions, this implementation will never call
/bin/sh implicitly. This means that all characters, including shell
meta-characters, can safely be passed to child processes.

Popen objects

Instances of the Popen class have the following methods:

poll()

    Check if child process has terminated. Returns returncode attribute.

wait()

    Wait for child process to terminate. Returns returncode attribute.

communicate(input=None)

    Interact with process: Send data to stdin. Read data from stdout and
    stderr, until end-of-file is reached. Wait for process to terminate.
    The optional stdin argument should be a string to be sent to the
    child process, or None, if no data should be sent to the child.

    communicate() returns a tuple (stdout, stderr).

    Note: The data read is buffered in memory, so do not use this method
    if the data size is large or unlimited.

The following attributes are also available:

stdin

    If the stdin argument is PIPE, this attribute is a file object that
    provides input to the child process. Otherwise, it is None.

stdout

    If the stdout argument is PIPE, this attribute is a file object that
    provides output from the child process. Otherwise, it is None.

stderr

    If the stderr argument is PIPE, this attribute is file object that
    provides error output from the child process. Otherwise, it is None.

pid

    The process ID of the child process.

returncode

    The child return code. A None value indicates that the process
    hasn't terminated yet. A negative value -N indicates that the child
    was terminated by signal N (UNIX only).

Replacing older functions with the subprocess module

In this section, "a ==> b" means that b can be used as a replacement for
a.

Note: All functions in this section fail (more or less) silently if the
executed program cannot be found; this module raises an OSError
exception.

In the following examples, we assume that the subprocess module is
imported with from subprocess import *.

Replacing /bin/sh shell backquote

    output=`mycmd myarg`
    ==>
    output = Popen(["mycmd", "myarg"], stdout=PIPE).communicate()[0]

Replacing shell pipe line

    output=`dmesg | grep hda`
    ==>
    p1 = Popen(["dmesg"], stdout=PIPE)
    p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
    output = p2.communicate()[0]

Replacing os.system()

    sts = os.system("mycmd" + " myarg")
    ==>
    p = Popen("mycmd" + " myarg", shell=True)
    sts = os.waitpid(p.pid, 0)

Note:

-   Calling the program through the shell is usually not required.
-   It's easier to look at the returncode attribute than the exit
    status.

A more real-world example would look like this:

    try:
        retcode = call("mycmd" + " myarg", shell=True)
        if retcode < 0:
            print >>sys.stderr, "Child was terminated by signal", -retcode
        else:
            print >>sys.stderr, "Child returned", retcode
    except OSError, e:
        print >>sys.stderr, "Execution failed:", e

Replacing os.spawn*

P_NOWAIT example:

    pid = os.spawnlp(os.P_NOWAIT, "/bin/mycmd", "mycmd", "myarg")
    ==>
    pid = Popen(["/bin/mycmd", "myarg"]).pid

P_WAIT example:

    retcode = os.spawnlp(os.P_WAIT, "/bin/mycmd", "mycmd", "myarg")
    ==>
    retcode = call(["/bin/mycmd", "myarg"])

Vector example:

    os.spawnvp(os.P_NOWAIT, path, args)
    ==>
    Popen([path] + args[1:])

Environment example:

    os.spawnlpe(os.P_NOWAIT, "/bin/mycmd", "mycmd", "myarg", env)
    ==>
    Popen(["/bin/mycmd", "myarg"], env={"PATH": "/usr/bin"})

Replacing os.popen*

    pipe = os.popen(cmd, mode='r', bufsize)
    ==>
    pipe = Popen(cmd, shell=True, bufsize=bufsize, stdout=PIPE).stdout

    pipe = os.popen(cmd, mode='w', bufsize)
    ==>
    pipe = Popen(cmd, shell=True, bufsize=bufsize, stdin=PIPE).stdin


    (child_stdin, child_stdout) = os.popen2(cmd, mode, bufsize)
    ==>
    p = Popen(cmd, shell=True, bufsize=bufsize,
              stdin=PIPE, stdout=PIPE, close_fds=True)
    (child_stdin, child_stdout) = (p.stdin, p.stdout)


    (child_stdin,
     child_stdout,
     child_stderr) = os.popen3(cmd, mode, bufsize)
    ==>
    p = Popen(cmd, shell=True, bufsize=bufsize,
              stdin=PIPE, stdout=PIPE, stderr=PIPE, close_fds=True)
    (child_stdin,
     child_stdout,
     child_stderr) = (p.stdin, p.stdout, p.stderr)


    (child_stdin, child_stdout_and_stderr) = os.popen4(cmd, mode, bufsize)
    ==>
    p = Popen(cmd, shell=True, bufsize=bufsize,
              stdin=PIPE, stdout=PIPE, stderr=STDOUT, close_fds=True)
    (child_stdin, child_stdout_and_stderr) = (p.stdin, p.stdout)

Replacing popen2.*

Note: If the cmd argument to popen2 functions is a string, the command
is executed through /bin/sh. If it is a list, the command is directly
executed.

    (child_stdout, child_stdin) = popen2.popen2("somestring", bufsize, mode)
    ==>
    p = Popen(["somestring"], shell=True, bufsize=bufsize
              stdin=PIPE, stdout=PIPE, close_fds=True)
    (child_stdout, child_stdin) = (p.stdout, p.stdin)


    (child_stdout, child_stdin) = popen2.popen2(["mycmd", "myarg"], bufsize, mode)
    ==>
    p = Popen(["mycmd", "myarg"], bufsize=bufsize,
              stdin=PIPE, stdout=PIPE, close_fds=True)
    (child_stdout, child_stdin) = (p.stdout, p.stdin)

The popen2.Popen3 and popen3.Popen4 basically works as subprocess.Popen,
except that:

-   subprocess.Popen raises an exception if the execution fails
-   the capturestderr argument is replaced with the stderr argument.
-   stdin=PIPE and stdout=PIPE must be specified.
-   popen2 closes all file descriptors by default, but you have to
    specify close_fds=True with subprocess.Popen.

Open Issues

Some features have been requested but is not yet implemented. This
includes:

-   Support for managing a whole flock of subprocesses
-   Support for managing "daemon" processes
-   Built-in method for killing subprocesses

While these are useful features, it's expected that these can be added
later without problems.

-   expect-like functionality, including pty support.

pty support is highly platform-dependent, which is a problem. Also,
there are already other modules that provide this kind of
functionality[4].

Backwards Compatibility

Since this is a new module, no major backward compatible issues are
expected. The module name "subprocess" might collide with other,
previous modules[5] with the same name, but the name "subprocess" seems
to be the best suggested name so far. The first name of this module was
"popen5", but this name was considered too unintuitive. For a while, the
module was called "process", but this name is already used by Trent
Mick's module[6].

The functions and modules that this new module is trying to replace
(os.system, os.spawn*, os.popen*, popen2.*, commands.*) are expected to
be available in future Python versions for a long time, to preserve
backwards compatibility.

Reference Implementation

A reference implementation is available from
http://www.lysator.liu.se/~astrand/popen5/.

References

Copyright

This document has been placed in the public domain.

[1] Secure Programming for Linux and Unix HOWTO, section 8.3.
http://www.dwheeler.com/secure-programs/

[2] Python Dialog http://pythondialog.sourceforge.net/

[3] http://starship.python.net/crew/mhammond/win32/

[4] http://www.lysator.liu.se/~ceder/pcl-expect/

[5] http://www.iol.ie/~padraiga/libs/subProcess.py

[6] http://starship.python.net/crew/tmick/