PEP: 432 Title: Restructuring the CPython startup sequence Version:
$Revision$ Last-Modified: $Date$ Author: Alyssa Coghlan
<ncoghlan@gmail.com>, Victor Stinner <vstinner@python.org>, Eric Snow
<ericsnowcurrently@gmail.com> Discussions-To: capi-sig@python.org
Status: Withdrawn Type: Standards Track Content-Type: text/x-rst
Requires: 587 Created: 28-Dec-2012 Post-History: 28-Dec-2012,
02-Jan-2013, 30-Mar-2019, 28-Jun-2020

PEP Withdrawal

From late 2012 to mid 2020, this PEP provided general background and
specific concrete proposals for making the CPython startup sequence
easier to maintain and the CPython runtime easier to embed as part of a
larger application.

For most of that time, the changes were maintained either in a separate
feature branch, or else as underscore-prefixed private APIs in the main
CPython repo.

In 2019, PEP 587 migrated a subset of those API changes to the public
CPython API for Python 3.8+ (specifically, the PEP updated the
interpreter runtime to offer an explicitly multi-stage struct-based
configuration interface).

In June 2020, in response to a query from the Steering Council, the PEP
authors decided that it made sense to withdraw the original PEP, as
enough has changed since PEP 432 was first written that we think any
further changes to the startup sequence and embedding API would be best
formulated as a new PEP (or PEPs) that take into account not only the
not-yet-implemented ideas from PEP 432 that weren't considered
sufficiently well validated to make their way into PEP 587, but also any
feedback on the public PEP 587 API, and any other lessons that have been
learned while adjusting the CPython implementation to be more embedding
and subinterpreter friendly.

In particular, PEPs proposing the following changes, and any further
infrastructure changes needed to enable them, would likely still be
worth exploring:

-   shipping an alternate Python executable that ignores all user level
    settings and runs in isolated mode by default, and would hence be
    more suitable for execution of system level Python applications than
    the default interpreter
-   enhancing the zipapp module to support the creation of single-file
    executables from pure Python scripts (and potentially even Python
    extension modules, given the introduction of multi-phase extension
    module initialisation)
-   migrating the complex sys.path initialisation logic from C to Python
    in order to improve test suite coverage and the general
    maintainability of that code

Abstract

This PEP proposes a mechanism for restructuring the startup sequence for
CPython, making it easier to modify the initialization behaviour of the
reference interpreter executable, as well as making it easier to control
CPython's startup behaviour when creating an alternate executable or
embedding it as a Python execution engine inside a larger application.

When implementation of this proposal is completed, interpreter startup
will consist of three clearly distinct and independently configurable
phases:

-   Python core runtime preinitialization
    -   setting up memory management
    -   determining the encodings used for system interfaces (including
        settings passed in for later configuration phase)
-   Python core runtime initialization
    -   ensuring C API is ready for use
    -   ensuring builtin and frozen modules are accessible
-   Main interpreter configuration
    -   ensuring external modules are accessible
    -   (Note: the name of this phase is quite likely to change)

Changes are also proposed that impact main module execution and
subinterpreter initialization.

Note: TBC = To Be Confirmed, TBD = To Be Determined. The appropriate
resolution for most of these should become clearer as the reference
implementation is developed.

Proposal

This PEP proposes that initialization of the CPython runtime be split
into three clearly distinct phases:

-   core runtime preinitialization
-   core runtime initialization
-   main interpreter configuration

(Earlier versions proposed only two phases, but experience with
attempting to implement the PEP as an internal CPython refactoring
showed that at least 3 phases are needed to get clear separation of
concerns)

The proposed design also has significant implications for:

-   main module execution
-   subinterpreter initialization

In the new design, the interpreter will move through the following
well-defined phases during the initialization sequence:

-   Uninitialized - haven't even started the pre-initialization phase
    yet
-   Pre-Initialization - no interpreter available
-   Runtime Initialized - main interpreter partially available,
    subinterpreter creation not yet available
-   Initialized - main interpreter fully available, subinterpreter
    creation available

PEP 587 is a more detailed proposal that covers separating out the
Pre-Initialization phase from the last two phases, but doesn't allow
embedding applications to run arbitrary code while in the "Runtime
Initialized" state (instead, initializing the core runtime will also
always fully initialize the main interpreter, as that's the way the
native CPython CLI still works in Python 3.8).

As a concrete use case to help guide any design changes, and to solve a
known problem where the appropriate defaults for system utilities differ
from those for running user scripts, this PEP proposes the creation and
distribution of a separate system Python (system-python) executable
which, by default, operates in "isolated mode" (as selected by the
CPython -I switch), as well as the creation of an example stub binary
that just runs an appended zip archive (permitting single-file pure
Python executables) rather than going through the normal CPython startup
sequence.

To keep the implementation complexity under control, this PEP does not
propose wholesale changes to the way the interpreter state is accessed
at runtime. Changing the order in which the existing initialization
steps occur in order to make the startup sequence easier to maintain is
already a substantial change, and attempting to make those other changes
at the same time will make the change significantly more invasive and
much harder to review. However, such proposals may be suitable topics
for follow-on PEPs or patches - one key benefit of this PEP and its
related subproposals is decreasing the coupling between the internal
storage model and the configuration interface, so such changes should be
easier once this PEP has been implemented.

Background

Over time, CPython's initialization sequence has become progressively
more complicated, offering more options, as well as performing more
complex tasks (such as configuring the Unicode settings for OS
interfaces in Python 3[1], bootstrapping a pure Python implementation of
the import system, and implementing an isolated mode more suitable for
system applications that run with elevated privileges[2]).

Much of this complexity is formally accessible only through the Py_Main
and Py_Initialize APIs, offering embedding applications little
opportunity for customisation. This creeping complexity also makes life
difficult for maintainers, as much of the configuration needs to take
place prior to the Py_Initialize call, meaning much of the Python C API
cannot be used safely.

A number of proposals are on the table for even more sophisticated
startup behaviour, such as better control over sys.path initialization
(e.g. easily adding additional directories on the command line in a
cross-platform fashion[3], controlling the configuration of
sys.path[0][4]), easier configuration of utilities like coverage tracing
when launching Python subprocesses[5]).

Rather than continuing to bolt such behaviour onto an already
complicated system indefinitely, this PEP proposes to start simplifying
the status quo by introducing a more structured startup sequence, with
the aim of making these further feature requests easier to implement.

Originally the entire proposal was maintained in this one PEP, but that
proved impractical, so as parts of the proposed design stabilised, they
are now split out into their own PEPs, allowing progress to be made,
even while the details of the overall design are still evolving.

Key Concerns

There are a few key concerns that any change to the startup sequence
needs to take into account.

Maintainability

The CPython startup sequence as of Python 3.6 was difficult to
understand, and even more difficult to modify. It was not clear what
state the interpreter was in while much of the initialization code
executed, leading to behaviour such as lists, dictionaries and Unicode
values being created prior to the call to Py_Initialize when the -X or
-W options are used[6].

By moving to an explicitly multi-phase startup sequence, developers
should only need to understand:

-   which APIs and features are available prior to pre-configuration
    (essentially none, except for the pre-configuration API itself)
-   which APIs and features are available prior to core runtime
    configuration, and will implicitly run the pre-configuration with
    default settings that match the behaviour of Python 3.6 if the
    pre-configuration hasn't been run explicitly
-   which APIs and features are only available after the main
    interpreter has been fully configured (which will hopefully be a
    relatively small subset of the full C API)

The first two aspects of that are covered by PEP 587, while the details
of the latter distinction are still being considered.

By basing the new design on a combination of C structures and Python
data types, it should also be easier to modify the system in the future
to add new configuration options.

Testability

One of the problems with the complexity of the CPython startup sequence
is the combinatorial explosion of possible interactions between
different configuration settings.

This concern impacts both the design of the new initialisation system,
and the proposed approach for getting there.

Performance

CPython is used heavily to run short scripts where the runtime is
dominated by the interpreter initialization time. Any changes to the
startup sequence should minimise their impact on the startup overhead.

Experience with the importlib migration suggests that the startup time
is dominated by IO operations. However, to monitor the impact of any
changes, a simple benchmark can be used to check how long it takes to
start and then tear down the interpreter:

    python3 -m timeit -s "from subprocess import call" "call(['./python', '-Sc', 'pass'])"

Current numbers on my system for Python 3.7 (as built by the Fedora
project):

    $ python3 -m timeit -s "from subprocess import call" "call(['python3', '-Sc', 'pass'])"
    50 loops, best of 5: 6.48 msec per loop

(TODO: run this microbenchmark with perf rather than the stdlib timeit)

This PEP is not expected to have any significant effect on the startup
time, as it is aimed primarily at reordering the existing initialization
sequence, without making substantial changes to the individual steps.

However, if this simple check suggests that the proposed changes to the
initialization sequence may pose a performance problem, then a more
sophisticated microbenchmark will be developed to assist in
investigation.

Required Configuration Settings

See PEP 587 for a detailed listing of CPython interpreter configuration
settings and the various means available for setting them.

Implementation Strategy

An initial attempt was made at implementing an earlier version of this
PEP for Python 3.4[7], with one of the significant problems encountered
being merge conflicts after the initial structural changes were put in
place to start the refactoring process. Unlike some other previous major
changes, such as the switch to an AST-based compiler in Python 2.5, or
the switch to the importlib implementation of the import system in
Python 3.3, there is no clear way to structure a draft implementation
that won't be prone to the kinds of merge conflicts that afflicted the
original attempt.

Accordingly, the implementation strategy was revised to instead first
implement this refactoring as a private API for CPython 3.7, and then
review the viability of exposing the new functions and structures as
public API elements in CPython 3.8.

After the initial merge, Victor Stinner then proceeded to actually
migrate settings to the new structure in order to successfully implement
the PEP 540 UTF-8 mode changes (which required the ability to track all
settings that had previously been decoded with the locale encoding, and
decode them again using UTF-8 instead). Eric Snow also migrated a number
of internal subsystems over as part of making the subinterpreter feature
more robust.

That work showed that the detailed design originally proposed in this
PEP had a range of practical issues, so Victor designed and implemented
an improved private API (inspired by an earlier iteration of this PEP),
which PEP 587 proposes to promote to a public API in Python 3.8.

Design Details

Note

The API details here are still very much in flux. The header files that
show the current state of the private API are mainly:

-   https://github.com/python/cpython/blob/master/Include/cpython/coreconfig.h
-   https://github.com/python/cpython/blob/master/Include/cpython/pystate.h
-   https://github.com/python/cpython/blob/master/Include/cpython/pylifecycle.h

PEP 587 covers the aspects of the API that are considered potentially
stable enough to make public. Where a proposed API is covered by that
PEP, "(see PEP 587)" is added to the text below.

The main theme of this proposal is to initialize the core language
runtime and create a partially initialized interpreter state for the
main interpreter much earlier in the startup process. This will allow
most of the CPython API to be used during the remainder of the
initialization process, potentially simplifying a number of operations
that currently need to rely on basic C functionality rather than being
able to use the richer data structures provided by the CPython C API.

PEP 587 covers a subset of that task, which is splitting out the
components that even the existing "May be called before Py_Initialize"
interfaces need (like memory allocators and operating system interface
encoding details) into a separate pre-configuration step.

In the following, the term "embedding application" also covers the
standard CPython command line application.

Interpreter Initialization Phases

The following distinct interpreter initialisation phases are proposed:

-   Uninitialized:
    -   Not really a phase, but the absence of a phase
    -   Py_IsInitializing() returns 0
    -   Py_IsRuntimeInitialized() returns 0
    -   Py_IsInitialized() returns 0
    -   The embedding application determines which memory allocator to
        use, and which encoding to use to access operating system
        interfaces (or chooses to delegate those decisions to the Python
        runtime)
    -   Application starts the initialization process by calling one of
        the Py_PreInitialize APIs (see PEP 587)
-   Runtime Pre-Initialization:
    -   no interpreter is available
    -   Py_IsInitializing() returns 1
    -   Py_IsRuntimeInitialized() returns 0
    -   Py_IsInitialized() returns 0
    -   The embedding application determines the settings required to
        initialize the core CPython runtime and create the main
        interpreter and moves to the next phase by calling
        Py_InitializeRuntime
    -   Note: as of PEP 587, the embedding application instead calls
        Py_Main(), Py_UnixMain, or one of the Py_Initialize APIs, and
        hence jumps directly to the Initialized state.
-   Main Interpreter Initialization:
    -   the builtin data types and other core runtime services are
        available
    -   the main interpreter is available, but only partially configured
    -   Py_IsInitializing() returns 1
    -   Py_IsRuntimeInitialized() returns 1
    -   Py_IsInitialized() returns 0
    -   The embedding application determines and applies the settings
        required to complete the initialization process by calling
        Py_InitializeMainInterpreter
    -   Note: as of PEP 587, this state is not reachable via any public
        API, it only exists as an implicit internal state while one of
        the Py_Initialize functions is running
-   Initialized:
    -   the main interpreter is available and fully operational, but
        __main__ related metadata is incomplete
    -   Py_IsInitializing() returns 0
    -   Py_IsRuntimeInitialized() returns 1
    -   Py_IsInitialized() returns 1

Invocation of Phases

All listed phases will be used by the standard CPython interpreter and
the proposed System Python interpreter.

An embedding application may still continue to leave initialization
almost entirely under CPython's control by using the existing
Py_Initialize or Py_Main() APIs - backwards compatibility will be
preserved.

Alternatively, if an embedding application wants greater control over
CPython's initial state, it will be able to use the new, finer grained
API, which allows the embedding application greater control over the
initialization process.

PEP 587 covers an initial iteration of that API, separating out the
pre-initialization phase without attempting to separate core runtime
initialization from main interpreter initialization.

Uninitialized State

The uninitialized state is where an embedding application determines the
settings which are required in order to be able to correctly pass
configurations settings to the embedded Python runtime.

This covers telling Python which memory allocator to use, as well as
which text encoding to use when processing provided settings.

PEP 587 defines the settings needed to exit this state in its
PyPreConfig struct.

A new query API will allow code to determine if the interpreter hasn't
even started the initialization process:

    int Py_IsInitializing();

The query for a completely uninitialized environment would then be
!(Py_Initialized() || Py_Initializing()).

Runtime Pre-Initialization Phase

Note

In PEP 587, the settings for this phase are not yet separated out, and
are instead only available through the combined PyConfig struct

The pre-initialization phase is where an embedding application
determines the settings which are absolutely required before the CPython
runtime can be initialized at all. Currently, the primary configuration
settings in this category are those related to the randomised hash
algorithm - the hash algorithms must be consistent for the lifetime of
the process, and so they must be in place before the core interpreter is
created.

The essential settings needed are a flag indicating whether or not to
use a specific seed value for the randomised hashes, and if so, the
specific value for the seed (a seed value of zero disables randomised
hashing). In addition, due to the possible use of PYTHONHASHSEED in
configuring the hash randomisation, the question of whether or not to
consider environment variables must also be addressed early. Finally, to
support the CPython build process, an option is offered to completely
disable the import system.

The proposed APIs for this step in the startup sequence are:

    PyInitError Py_InitializeRuntime(
        const PyRuntimeConfig *config
    );

    PyInitError Py_InitializeRuntimeFromArgs(
        const PyRuntimeConfig *config, int argc, char **argv
    );

    PyInitError Py_InitializeRuntimeFromWideArgs(
        const PyRuntimeConfig *config, int argc, wchar_t **argv
    );

If Py_IsInitializing() is false, the Py_InitializeRuntime functions will
implicitly call the corresponding Py_PreInitialize function. The
use_environment setting will be passed down, while other settings will
be processed according to their defaults, as described in PEP 587.

The PyInitError return type is defined in PEP 587, and allows an
embedding application to gracefully handle Python runtime initialization
failures, rather than having the entire process abruptly terminated by
Py_FatalError.

The new PyRuntimeConfig struct holds the settings required for
preliminary configuration of the core runtime and creation of the main
interpreter:

    /* Note: if changing anything in PyRuntimeConfig, also update
     * PyRuntimeConfig_INIT */
    typedef struct {
        bool use_environment;     /* as in PyPreConfig, PyConfig from PEP 587 */
        int use_hash_seed;        /* PYTHONHASHSEED, as in PyConfig from PEP 587 */
        unsigned long hash_seed;  /* PYTHONHASHSEED, as in PyConfig from PEP 587 */
        bool _install_importlib;  /* Needed by freeze_importlib */
    } PyRuntimeConfig;

    /* Rely on the "designated initializer" feature of C99 */
    #define PyRuntimeConfig_INIT {.use_hash_seed=-1}

The core configuration settings pointer may be NULL, in which case the
default values are as specified in PyRuntimeConfig_INIT.

The PyRuntimeConfig_INIT macro is designed to allow easy initialization
of a struct instance with sensible defaults:

    PyRuntimeConfig runtime_config = PyRuntimeConfig_INIT;

use_environment controls the processing of all Python related
environment variables. If the flag is true, then PYTHONHASHSEED is
processed normally. Otherwise, all Python-specific environment variables
are considered undefined (exceptions may be made for some OS specific
environment variables, such as those used on Mac OS X to communicate
between the App bundle and the main Python binary).

use_hash_seed controls the configuration of the randomised hash
algorithm. If it is zero, then randomised hashes with a random seed will
be used. It is positive, then the value in hash_seed will be used to
seed the random number generator. If the hash_seed is zero in this case,
then the randomised hashing is disabled completely.

If use_hash_seed is negative (and use_environment is true), then CPython
will inspect the PYTHONHASHSEED environment variable. If the environment
variable is not set, is set to the empty string, or to the value
"random", then randomised hashes with a random seed will be used. If the
environment variable is set to the string "0" the randomised hashing
will be disabled. Otherwise, the hash seed is expected to be a string
representation of an integer in the range [0; 4294967295].

To make it easier for embedding applications to use the PYTHONHASHSEED
processing with a different data source, the following helper function
will be added to the C API:

    int Py_ReadHashSeed(char *seed_text,
                        int *use_hash_seed,
                        unsigned long *hash_seed);

This function accepts a seed string in seed_text and converts it to the
appropriate flag and seed values. If seed_text is NULL, the empty string
or the value "random", both use_hash_seed and hash_seed will be set to
zero. Otherwise, use_hash_seed will be set to 1 and the seed text will
be interpreted as an integer and reported as hash_seed. On success the
function will return zero. A non-zero return value indicates an error
(most likely in the conversion to an integer).

The _install_importlib setting is used as part of the CPython build
process to create an interpreter with no import capability at all. It is
considered private to the CPython development team (hence the leading
underscore), as the only currently supported use case is to permit
compiler changes that invalidate the previously frozen bytecode for
importlib._bootstrap without breaking the build process.

The aim is to keep this initial level of configuration as small as
possible in order to keep the bootstrapping environment consistent
across different embedding applications. If we can create a valid
interpreter state without the setting, then the setting should appear
solely in the comprehensive PyConfig struct rather than in the core
runtime configuration.

A new query API will allow code to determine if the interpreter is in
the bootstrapping state between the core runtime initialization and the
creation of the main interpreter state and the completion of the bulk of
the main interpreter initialization process:

    int Py_IsRuntimeInitialized();

Attempting to call Py_InitializeRuntime() again when
Py_IsRuntimeInitialized() is already true is reported as a user
configuration error. (TBC, as existing public initialisation APIs
support being called multiple times without error, and simply ignore
changes to any write-once settings. It may make sense to keep that
behaviour rather than trying to make the new API stricter than the old
one)

As frozen bytecode may now be legitimately run in an interpreter which
is not yet fully initialized, sys.flags will gain a new initialized
flag.

With the core runtime initialised, the main interpreter and most of the
CPython C API should be fully functional except that:

-   compilation is not allowed (as the parser and compiler are not yet
    configured properly)
-   creation of subinterpreters is not allowed
-   creation of additional thread states is not allowed
-   The following attributes in the sys module are all either missing or
    None:
    -   sys.path
    -   sys.argv
    -   sys.executable
    -   sys.base_exec_prefix
    -   sys.base_prefix
    -   sys.exec_prefix
    -   sys.prefix
    -   sys.warnoptions
    -   sys.dont_write_bytecode
    -   sys.stdin
    -   sys.stdout
-   The filesystem encoding is not yet defined
-   The IO encoding is not yet defined
-   CPython signal handlers are not yet installed
-   Only builtin and frozen modules may be imported (due to above
    limitations)
-   sys.stderr is set to a temporary IO object using unbuffered binary
    mode
-   The sys.flags attribute exists, but the individual flags may not yet
    have their final values.
-   The sys.flags.initialized attribute is set to 0
-   The warnings module is not yet initialized
-   The __main__ module does not yet exist

<TBD: identify any other notable missing functionality>

The main things made available by this step will be the core Python data
types, in particular dictionaries, lists and strings. This allows them
to be used safely for all of the remaining configuration steps (unlike
the status quo).

In addition, the current thread will possess a valid Python thread
state, allowing any further configuration data to be stored on the main
interpreter object rather than in C process globals.

Any call to Py_InitializeRuntime() must have a matching call to
Py_Finalize(). It is acceptable to skip calling
Py_InitializeMainInterpreter() in between (e.g. if attempting to build
the main interpreter configuration settings fails).

Determining the remaining configuration settings

The next step in the initialization sequence is to determine the
remaining settings needed to complete the process. No changes are made
to the interpreter state at this point. The core APIs for this step are:

    int Py_BuildPythonConfig(
        PyConfigAsObjects *py_config, const PyConfig *c_config
    );

    int Py_BuildPythonConfigFromArgs(
        PyConfigAsObjects *py_config, const PyConfig *c_config, int argc, char **argv
    );

    int Py_BuildPythonConfigFromWideArgs(
        PyConfigAsObjects *py_config, const PyConfig *c_config, int argc, wchar_t **argv
    );

The py_config argument should be a pointer to a PyConfigAsObjects struct
(which may be a temporary one stored on the C stack). For any already
configured value (i.e. any non-NULL pointer), CPython will sanity check
the supplied value, but otherwise accept it as correct.

A struct is used rather than a Python dictionary as the struct is easier
to work with from C, the list of supported fields is fixed for a given
CPython version and only a read-only view needs to be exposed to Python
code (which is relatively straightforward, thanks to the infrastructure
already put in place to expose sys.implementation).

Unlike Py_InitializeRuntime, this call will raise a Python exception and
report an error return rather than returning a Python initialization
specific C struct if a problem is found with the config data.

Any supported configuration setting which is not already set will be
populated appropriately in the supplied configuration struct. The
default configuration can be overridden entirely by setting the value
before calling Py_BuildPythonConfig. The provided value will then also
be used in calculating any other settings derived from that value.

Alternatively, settings may be overridden after the Py_BuildPythonConfig
call (this can be useful if an embedding application wants to adjust a
setting rather than replace it completely, such as removing
sys.path[0]).

The c_config argument is an optional pointer to a PyConfig structure, as
defined in PEP 587. If provided, it is used in preference to reading
settings directly from the environment or process global state.

Merely reading the configuration has no effect on the interpreter state:
it only modifies the passed in configuration struct. The settings are
not applied to the running interpreter until the
Py_InitializeMainInterpreter call (see below).

Supported configuration settings

The interpreter configuration is split into two parts: settings which
are either relevant only to the main interpreter or must be identical
across the main interpreter and all subinterpreters, and settings which
may vary across subinterpreters.

NOTE: For initial implementation purposes, only the flag indicating
whether or not the interpreter is the main interpreter will be
configured on a per interpreter basis. Other fields will be reviewed for
whether or not they can feasibly be made interpreter specific over the
course of the implementation.

Note

The list of config fields below is currently out of sync with PEP 587.
Where they differ, PEP 587 takes precedence.

The PyConfigAsObjects struct mirrors the PyConfig struct from PEP 587,
but uses full Python objects to store values, rather than C level data
types. It adds raw_argv and argv list fields, so later initialisation
steps don't need to accept those separately.

Fields are always pointers to Python data types, with unset values
indicated by NULL:

    typedef struct {
        /* Argument processing */
        PyListObject *raw_argv;
        PyListObject *argv;
        PyListObject *warnoptions; /* -W switch, PYTHONWARNINGS */
        PyDictObject *xoptions;    /* -X switch */

        /* Filesystem locations */
        PyUnicodeObject *program_name;
        PyUnicodeObject *executable;
        PyUnicodeObject *prefix;           /* PYTHONHOME */
        PyUnicodeObject *exec_prefix;      /* PYTHONHOME */
        PyUnicodeObject *base_prefix;      /* pyvenv.cfg */
        PyUnicodeObject *base_exec_prefix; /* pyvenv.cfg */

        /* Site module */
        PyBoolObject *enable_site_config;  /* -S switch (inverted) */
        PyBoolObject *no_user_site;        /* -s switch, PYTHONNOUSERSITE */

        /* Import configuration */
        PyBoolObject *dont_write_bytecode; /* -B switch, PYTHONDONTWRITEBYTECODE */
        PyBoolObject *ignore_module_case;  /* PYTHONCASEOK */
        PyListObject *import_path;        /* PYTHONPATH (etc) */

        /* Standard streams */
        PyBoolObject    *use_unbuffered_io; /* -u switch, PYTHONUNBUFFEREDIO */
        PyUnicodeObject *stdin_encoding;    /* PYTHONIOENCODING */
        PyUnicodeObject *stdin_errors;      /* PYTHONIOENCODING */
        PyUnicodeObject *stdout_encoding;   /* PYTHONIOENCODING */
        PyUnicodeObject *stdout_errors;     /* PYTHONIOENCODING */
        PyUnicodeObject *stderr_encoding;   /* PYTHONIOENCODING */
        PyUnicodeObject *stderr_errors;     /* PYTHONIOENCODING */

        /* Filesystem access */
        PyUnicodeObject *fs_encoding;

        /* Debugging output */
        PyBoolObject *debug_parser;    /* -d switch, PYTHONDEBUG */
        PyLongObject *verbosity;       /* -v switch */

        /* Code generation */
        PyLongObject *bytes_warnings;  /* -b switch */
        PyLongObject *optimize;        /* -O switch */

        /* Signal handling */
        PyBoolObject *install_signal_handlers;

        /* Implicit execution */
        PyUnicodeObject *startup_file;  /* PYTHONSTARTUP */

        /* Main module
         *
         * If prepare_main is set, at most one of the main_* settings should
         * be set before calling PyRun_PrepareMain (Py_ReadMainInterpreterConfig
         * will set one of them based on the command line arguments if
         * prepare_main is non-zero when that API is called).
        PyBoolObject    *prepare_main;
        PyUnicodeObject *main_source; /* -c switch */
        PyUnicodeObject *main_path;   /* filesystem path */
        PyUnicodeObject *main_module; /* -m switch */
        PyCodeObject    *main_code;   /* Run directly from a code object */
        PyObject        *main_stream; /* Run from stream */
        PyBoolObject    *run_implicit_code; /* Run implicit code during prep */

        /* Interactive main
         *
         * Note: Settings related to interactive mode are very much in flux.
         */
        PyObject *prompt_stream;      /* Output interactive prompt */
        PyBoolObject *show_banner;    /* -q switch (inverted) */
        PyBoolObject *inspect_main;   /* -i switch, PYTHONINSPECT */

    } PyConfigAsObjects;

The PyInterpreterConfig struct holds the settings that may vary between
the main interpreter and subinterpreters. For the main interpreter,
these settings are automatically populated by
Py_InitializeMainInterpreter().

    typedef struct {
        PyBoolObject *is_main_interpreter;    /* Easily check for subinterpreters */
    } PyInterpreterConfig;

As these structs consist solely of object pointers, no explicit
initializer definitions are needed - C99's default initialization of
struct memory to zero is sufficient.

Completing the main interpreter initialization

The final step in the initialization process is to actually put the
configuration settings into effect and finish bootstrapping the main
interpreter up to full operation:

    int Py_InitializeMainInterpreter(const PyConfigAsObjects *config);

Like Py_BuildPythonConfig, this call will raise an exception and report
an error return rather than exhibiting fatal errors if a problem is
found with the config data. (TBC, as existing public initialisation APIs
support being called multiple times without error, and simply ignore
changes to any write-once settings. It may make sense to keep that
behaviour rather than trying to make the new API stricter than the old
one)

All configuration settings are required - the configuration struct
should always be passed through Py_BuildPythonConfig to ensure it is
fully populated.

After a successful call Py_IsInitialized() will become true and
Py_IsInitializing() will become false. The caveats described above for
the interpreter during the phase where only the core runtime is
initialized will no longer hold.

Attempting to call Py_InitializeMainInterpreter() again when
Py_IsInitialized() is true is an error.

However, some metadata related to the __main__ module may still be
incomplete:

-   sys.argv[0] may not yet have its final value
    -   it will be -m when executing a module or package with CPython
    -   it will be the same as sys.path[0] rather than the location of
        the __main__ module when executing a valid sys.path entry
        (typically a zipfile or directory)
    -   otherwise, it will be accurate:
        -   the script name if running an ordinary script
        -   -c if executing a supplied string
        -   - or the empty string if running from stdin
-   the metadata in the __main__ module will still indicate it is a
    builtin module

This function will normally implicitly import site as its final
operation (after Py_IsInitialized() is already set). Setting the
"enable_site_config" flag to Py_False in the configuration settings will
disable this behaviour, as well as eliminating any side effects on
global state if import site is later explicitly executed in the process.

Preparing the main module

Note

In PEP 587, PyRun_PrepareMain and PyRun_ExecMain are not exposed
separately, and are instead accessed through a Py_RunMain API that both
prepares and executes main, and then finalizes the Python interpreter.

This subphase completes the population of the __main__ module related
metadata, without actually starting execution of the __main__ module
code.

It is handled by calling the following API:

    int PyRun_PrepareMain();

This operation is only permitted for the main interpreter, and will
raise RuntimeError when invoked from a thread where the current thread
state belongs to a subinterpreter.

The actual processing is driven by the main related settings stored in
the interpreter state as part of the configuration struct.

If prepare_main is zero, this call does nothing.

If all of main_source, main_path, main_module, main_stream and main_code
are NULL, this call does nothing.

If more than one of main_source, main_path, main_module, main_stream or
main_code are set, RuntimeError will be reported.

If main_code is already set, then this call does nothing.

If main_stream is set, and run_implicit_code is also set, then the file
identified in startup_file will be read, compiled and executed in the
__main__ namespace.

If main_source, main_path or main_module are set, then this call will
take whatever steps are needed to populate main_code:

-   For main_source, the supplied string will be compiled and saved to
    main_code.
-   For main_path:
    -   if the supplied path is recognised as a valid sys.path entry, it
        is inserted as sys.path[0], main_module is set to __main__ and
        processing continues as for main_module below.
    -   otherwise, path is read as a CPython bytecode file
    -   if that fails, it is read as a Python source file and compiled
    -   in the latter two cases, the code object is saved to main_code
        and __main__.__file__ is set appropriately
-   For main_module:
    -   any parent package is imported
    -   the loader for the module is determined
    -   if the loader indicates the module is a package, add .__main__
        to the end of main_module and try again (if the final name
        segment is already .__main__ then fail immediately)
    -   once the module source code is located, save the compiled module
        code as main_code and populate the following attributes in
        __main__ appropriately: __name__, __loader__, __file__,
        __cached__, __package__.

(Note: the behaviour described in this section isn't new, it's a
write-up of the current behaviour of the CPython interpreter adjusted
for the new configuration system)

Executing the main module

Note

In PEP 587, PyRun_PrepareMain and PyRun_ExecMain are not exposed
separately, and are instead accessed through a Py_RunMain API that both
prepares and executes main, and then finalizes the Python interpreter.

This subphase covers the execution of the actual __main__ module code.

It is handled by calling the following API:

    int PyRun_ExecMain();

This operation is only permitted for the main interpreter, and will
raise RuntimeError when invoked from a thread where the current thread
state belongs to a subinterpreter.

The actual processing is driven by the main related settings stored in
the interpreter state as part of the configuration struct.

If both main_stream and main_code are NULL, this call does nothing.

If both main_stream and main_code are set, RuntimeError will be
reported.

If main_stream and prompt_stream are both set, main execution will be
delegated to a new internal API:

    int _PyRun_InteractiveMain(PyObject *input, PyObject* output);

If main_stream is set and prompt_stream is NULL, main execution will be
delegated to a new internal API:

    int _PyRun_StreamInMain(PyObject *input);

If main_code is set, main execution will be delegated to a new internal
API:

    int _PyRun_CodeInMain(PyCodeObject *code);

After execution of main completes, if inspect_main is set, or the
PYTHONINSPECT environment variable has been set, then PyRun_ExecMain
will invoke _PyRun_InteractiveMain(sys.__stdin__, sys.__stdout__).

Internal Storage of Configuration Data

The interpreter state will be updated to include details of the
configuration settings supplied during initialization by extending the
interpreter state object with at least an embedded copy of the
PyConfigAsObjects and PyInterpreterConfig structs.

For debugging purposes, the configuration settings will be exposed as a
sys._configuration simple namespace (similar to sys.flags and
sys.implementation. The attributes will be themselves by simple
namespaces corresponding to the two levels of configuration setting:

-   all_interpreters
-   active_interpreter

Field names will match those in the configuration structs, except for
hash_seed, which will be deliberately excluded.

An underscored attribute is chosen deliberately, as these configuration
settings are part of the CPython implementation, rather than part of the
Python language definition. If new settings are needed to support
cross-implementation compatibility in the standard library, then those
should be agreed with the other implementations and exposed as new
required attributes on sys.implementation, as described in PEP 421.

These are snapshots of the initial configuration settings. They are not
modified by the interpreter during runtime (except as noted above).

Creating and Configuring Subinterpreters

As the new configuration settings are stored in the interpreter state,
they need to be initialised when a new subinterpreter is created. This
turns out to be trickier than one might expect due to
PyThreadState_Swap(NULL); (which is fortunately exercised by CPython's
own embedding tests, allowing this problem to be detected during
development).

To provide a straightforward solution for this case, the PEP proposes to
add a new API:

    Py_InterpreterState *Py_InterpreterState_Main();

This will be a counterpart to Py_InterpreterState_Head(), only reporting
the oldest currently existing interpreter rather than the newest. If
Py_NewInterpreter() is called from a thread with an existing thread
state, then the interpreter configuration for that thread will be used
when initialising the new subinterpreter. If there is no current thread
state, the configuration from Py_InterpreterState_Main() will be used.

While the existing Py_InterpreterState_Head() API could be used instead,
that reference changes as subinterpreters are created and destroyed,
while PyInterpreterState_Main() will always refer to the initial
interpreter state created in Py_InitializeRuntime().

A new constraint is also added to the embedding API: attempting to
delete the main interpreter while subinterpreters still exist will now
be a fatal error.

Stable ABI

Most of the APIs proposed in this PEP are excluded from the stable ABI,
as embedding a Python interpreter involves a much higher degree of
coupling than merely writing an extension module.

The only newly exposed APIs that will be part of the stable ABI are the
Py_IsInitializing() and Py_IsRuntimeInitialized() queries.

Build time configuration

This PEP makes no changes to the handling of build time configuration
settings, and thus has no effect on the contents of sys.implementation
or the result of sysconfig.get_config_vars().

Backwards Compatibility

Backwards compatibility will be preserved primarily by ensuring that
Py_BuildPythonConfig() interrogates all the previously defined
configuration settings stored in global variables and environment
variables, and that Py_InitializeMainInterpreter() writes affected
settings back to the relevant locations.

One acknowledged incompatibility is that some environment variables
which are currently read lazily may instead be read once during
interpreter initialization. As the reference implementation matures,
these will be discussed in more detail on a case-by-case basis. The
environment variables which are currently known to be looked up
dynamically are:

-   PYTHONCASEOK: writing to os.environ['PYTHONCASEOK'] will no longer
    dynamically alter the interpreter's handling of filename case
    differences on import (TBC)
-   PYTHONINSPECT: os.environ['PYTHONINSPECT'] will still be checked
    after execution of the __main__ module terminates

The Py_Initialize() style of initialization will continue to be
supported. It will use (at least some elements of) the new API
internally, but will continue to exhibit the same behaviour as it does
today, ensuring that sys.argv is not populated until a subsequent
PySys_SetArgv call (TBC). All APIs that currently support being called
prior to Py_Initialize() will continue to do so, and will also support
being called prior to Py_InitializeRuntime().

A System Python Executable

When executing system utilities with administrative access to a system,
many of the default behaviours of CPython are undesirable, as they may
allow untrusted code to execute with elevated privileges. The most
problematic aspects are the fact that user site directories are enabled,
environment variables are trusted and that the directory containing the
executed file is placed at the beginning of the import path.

Issue 16499[8] added a -I option to change the behaviour of the normal
CPython executable, but this is a hard to discover solution (and adds
yet another option to an already complex CLI). This PEP proposes to
instead add a separate system-python executable

Currently, providing a separate executable with different default
behaviour would be prohibitively hard to maintain. One of the goals of
this PEP is to make it possible to replace much of the hard to maintain
bootstrapping code with more normal CPython code, as well as making it
easier for a separate application to make use of key components of
Py_Main. Including this change in the PEP is designed to help avoid
acceptance of a design that sounds good in theory but proves to be
problematic in practice.

Cleanly supporting this kind of "alternate CLI" is the main reason for
the proposed changes to better expose the core logic for deciding
between the different execution modes supported by CPython:

-   script execution
-   directory/zipfile execution
-   command execution ("-c" switch)
-   module or package execution ("-m" switch)
-   execution from stdin (non-interactive)
-   interactive stdin

Actually implementing this may also reveal the need for some better
argument parsing infrastructure for use during the initializing phase.

Open Questions

-   Error details for Py_BuildPythonConfig and
    Py_InitializeMainInterpreter (these should become clearer as the
    implementation progresses)

Implementation

The reference implementation is being developed as a private API
refactoring within the CPython reference interpreter (as attempting to
maintain it as an independent project proved impractical).

PEP 587 extracts a subset of the proposal that is considered
sufficiently stable to be worth proposing as a public API for Python
3.8.

The Status Quo (as of Python 3.6)

The current mechanisms for configuring the interpreter have accumulated
in a fairly ad hoc fashion over the past 20+ years, leading to a rather
inconsistent interface with varying levels of documentation.

Also see PEP 587 for further discussion of the existing settings and
their handling.

(Note: some of the info below could probably be cleaned up and added to
the C API documentation for 3.x - it's all CPython specific, so it
doesn't belong in the language reference)

Ignoring Environment Variables

The -E command line option allows all environment variables to be
ignored when initializing the Python interpreter. An embedding
application can enable this behaviour by setting
Py_IgnoreEnvironmentFlag before calling Py_Initialize().

In the CPython source code, the Py_GETENV macro implicitly checks this
flag, and always produces NULL if it is set.

<TBD: I believe PYTHONCASEOK is checked regardless of this setting >
<TBD: Does -E also ignore Windows registry keys? >

Randomised Hashing

The randomised hashing is controlled via the -R command line option (in
releases prior to 3.3), as well as the PYTHONHASHSEED environment
variable.

In Python 3.3, only the environment variable remains relevant. It can be
used to disable randomised hashing (by using a seed value of 0) or else
to force a specific hash value (e.g. for repeatability of testing, or to
share hash values between processes)

However, embedding applications must use the Py_HashRandomizationFlag to
explicitly request hash randomisation (CPython sets it in Py_Main()
rather than in Py_Initialize()).

The new configuration API should make it straightforward for an
embedding application to reuse the PYTHONHASHSEED processing with a text
based configuration setting provided by other means (e.g. a config file
or separate environment variable).

Locating Python and the standard library

The location of the Python binary and the standard library is influenced
by several elements. The algorithm used to perform the calculation is
not documented anywhere other than in the source code[9],[10]. Even that
description is incomplete, as it failed to be updated for the virtual
environment support added in Python 3.3 (detailed in PEP 405).

These calculations are affected by the following function calls (made
prior to calling Py_Initialize()) and environment variables:

-   Py_SetProgramName()
-   Py_SetPythonHome()
-   PYTHONHOME

The filesystem is also inspected for pyvenv.cfg files (see PEP 405) or,
failing that, a lib/os.py (Windows) or lib/python$VERSION/os.py file.

The build time settings for PREFIX and EXEC_PREFIX are also relevant, as
are some registry settings on Windows. The hardcoded fallbacks are based
on the layout of the CPython source tree and build output when working
in a source checkout.

Configuring sys.path

An embedding application may call Py_SetPath() prior to Py_Initialize()
to completely override the calculation of sys.path. It is not
straightforward to only allow some of the calculations, as modifying
sys.path after initialization is already complete means those
modifications will not be in effect when standard library modules are
imported during the startup sequence.

If Py_SetPath() is not used prior to the first call to Py_GetPath()
(implicit in Py_Initialize()), then it builds on the location data
calculations above to calculate suitable path entries, along with the
PYTHONPATH environment variable.

<TBD: On Windows, there's also a bunch of stuff to do with the registry>

The site module, which is implicitly imported at startup (unless
disabled via the -S option) adds additional paths to this initial set of
paths, as described in its documentation[11].

The -s command line option can be used to exclude the user site
directory from the list of directories added. Embedding applications can
control this by setting the Py_NoUserSiteDirectory global variable.

The following commands can be used to check the default path
configurations for a given Python executable on a given system:

-   ./python -c "import sys, pprint; pprint.pprint(sys.path)"
    -   standard configuration
-   ./python -s -c "import sys, pprint; pprint.pprint(sys.path)"
    -   user site directory disabled
-   ./python -S -c "import sys, pprint; pprint.pprint(sys.path)"
    -   all site path modifications disabled

(Note: you can see similar information using -m site instead of -c, but
this is slightly misleading as it calls os.abspath on all of the path
entries, making relative path entries look absolute. Using the site
module also causes problems in the last case, as on Python versions
prior to 3.3, explicitly importing site will carry out the path
modifications -S avoids, while on 3.3+ combining -m site with -S
currently fails)

The calculation of sys.path[0] is comparatively straightforward:

-   For an ordinary script (Python source or compiled bytecode),
    sys.path[0] will be the directory containing the script.
-   For a valid sys.path entry (typically a zipfile or directory),
    sys.path[0] will be that path
-   For an interactive session, running from stdin or when using the -c
    or -m switches, sys.path[0] will be the empty string, which the
    import system interprets as allowing imports from the current
    directory

Configuring sys.argv

Unlike most other settings discussed in this PEP, sys.argv is not set
implicitly by Py_Initialize(). Instead, it must be set via an explicitly
call to Py_SetArgv().

CPython calls this in Py_Main() after calling Py_Initialize(). The
calculation of sys.argv[1:] is straightforward: they're the command line
arguments passed after the script name or the argument to the -c or -m
options.

The calculation of sys.argv[0] is a little more complicated:

-   For an ordinary script (source or bytecode), it will be the script
    name
-   For a sys.path entry (typically a zipfile or directory) it will
    initially be the zipfile or directory name, but will later be
    changed by the runpy module to the full path to the imported
    __main__ module.
-   For a module specified with the -m switch, it will initially be the
    string "-m", but will later be changed by the runpy module to the
    full path to the executed module.
-   For a package specified with the -m switch, it will initially be the
    string "-m", but will later be changed by the runpy module to the
    full path to the executed __main__ submodule of the package.
-   For a command executed with -c, it will be the string "-c"
-   For explicitly requested input from stdin, it will be the string "-"
-   Otherwise, it will be the empty string

Embedding applications must call Py_SetArgv themselves. The CPython
logic for doing so is part of Py_Main() and is not exposed separately.
However, the runpy module does provide roughly equivalent logic in
runpy.run_module and runpy.run_path.

Other configuration settings

TBD: Cover the initialization of the following in more detail:

-   Completely disabling the import system
-   The initial warning system state:
    -   sys.warnoptions
    -   (-W option, PYTHONWARNINGS)
-   Arbitrary extended options (e.g. to automatically enable
    faulthandler):
    -   sys._xoptions
    -   (-X option)
-   The filesystem encoding used by:
    -   sys.getfsencoding
    -   os.fsencode
    -   os.fsdecode
-   The IO encoding and buffering used by:
    -   sys.stdin
    -   sys.stdout
    -   sys.stderr
    -   (-u option, PYTHONIOENCODING, PYTHONUNBUFFEREDIO)
-   Whether or not to implicitly cache bytecode files:
    -   sys.dont_write_bytecode
    -   (-B option, PYTHONDONTWRITEBYTECODE)
-   Whether or not to enforce correct case in filenames on
    case-insensitive platforms
    -   os.environ["PYTHONCASEOK"]
-   The other settings exposed to Python code in sys.flags:
    -   debug (Enable debugging output in the pgen parser)
    -   inspect (Enter interactive interpreter after __main__
        terminates)
    -   interactive (Treat stdin as a tty)
    -   optimize (__debug__ status, write .pyc or .pyo, strip doc
        strings)
    -   no_user_site (don't add the user site directory to sys.path)
    -   no_site (don't implicitly import site during startup)
    -   ignore_environment (whether environment vars are used during
        config)
    -   verbose (enable all sorts of random output)
    -   bytes_warning (warnings/errors for implicit str/bytes
        interaction)
    -   quiet (disable banner output even if verbose is also enabled or
        stdin is a tty and the interpreter is launched in interactive
        mode)
-   Whether or not CPython's signal handlers should be installed

Much of the configuration of CPython is currently handled through C
level global variables:

    Py_BytesWarningFlag (-b)
    Py_DebugFlag (-d option)
    Py_InspectFlag (-i option, PYTHONINSPECT)
    Py_InteractiveFlag (property of stdin, cannot be overridden)
    Py_OptimizeFlag (-O option, PYTHONOPTIMIZE)
    Py_DontWriteBytecodeFlag (-B option, PYTHONDONTWRITEBYTECODE)
    Py_NoUserSiteDirectory (-s option, PYTHONNOUSERSITE)
    Py_NoSiteFlag (-S option)
    Py_UnbufferedStdioFlag (-u, PYTHONUNBUFFEREDIO)
    Py_VerboseFlag (-v option, PYTHONVERBOSE)

For the above variables, the conversion of command line options and
environment variables to C global variables is handled by Py_Main, so
each embedding application must set those appropriately in order to
change them from their defaults.

Some configuration can only be provided as OS level environment
variables:

    PYTHONSTARTUP
    PYTHONCASEOK
    PYTHONIOENCODING

The Py_InitializeEx() API also accepts a boolean flag to indicate
whether or not CPython's signal handlers should be installed.

Finally, some interactive behaviour (such as printing the introductory
banner) is triggered only when standard input is reported as a terminal
connection by the operating system.

TBD: Document how the "-x" option is handled (skips processing of the
first comment line in the main script)

Also see detailed sequence of operations notes at[12].

References

Copyright

This document has been placed in the public domain.

[1] Problems with PYTHONIOENCODING in Blender
(http://bugs.python.org/issue16129)

[2] Proposed CLI option for isolated mode
(http://bugs.python.org/issue16499)

[3] Adding to sys.path on the command line
(https://mail.python.org/pipermail/python-ideas/2010-October/008299.html)
(https://mail.python.org/pipermail/python-ideas/2012-September/016128.html)

[4] Control sys.path[0] initialisation
(http://bugs.python.org/issue13475)

[5] Enabling code coverage in subprocesses when testing
(http://bugs.python.org/issue14803)

[6] CPython interpreter initialization notes
(http://wiki.python.org/moin/CPythonInterpreterInitialization)

[7] BitBucket Sandbox
(https://bitbucket.org/ncoghlan/cpython_sandbox/compare/pep432_modular_bootstrap..default#commits)

[8] Proposed CLI option for isolated mode
(http://bugs.python.org/issue16499)

[9] *nix getpath implementation
(http://hg.python.org/cpython/file/default/Modules/getpath.c)

[10] Windows getpath implementation
(http://hg.python.org/cpython/file/default/PC/getpathp.c)

[11] Site module documentation
(http://docs.python.org/3/library/site.html)

[12] CPython interpreter initialization notes
(http://wiki.python.org/moin/CPythonInterpreterInitialization)