PEP: 648 Title: Extensible customizations of the interpreter at startup
Author: Mario Corchero <mariocj89@gmail.com> Sponsor: Pablo Galindo
Discussions-To:
https://discuss.python.org/t/pep-648-extensible-customizations-of-the-interpreter-at-startup/6403
Status: Rejected Type: Standards Track Content-Type: text/x-rst Created:
30-Dec-2020 Python-Version: 3.11 Post-History: 16-Dec-2020, 18-Dec-2020

Abstract

This PEP proposes supporting extensible customization of the interpreter
by allowing users to install files that will be executed at startup.

PEP Rejection

PEP 648 was rejected by the steering council as it has a limited number
of use cases and further complicates the startup sequence.

Motivation

System administrators, tools that repackage the interpreter and some
libraries need to customize aspects of the interpreter at startup time.

This is usually achieved via sitecustomize.py for system administrators
whilst libraries rely on exploiting pth files. This PEP proposes a way
of achieving the same functionality in a more user-friendly and
structured way.

Limitations of pth files

If a library needs to perform any customization before an import or that
relates to the general working of the interpreter, they often rely on
the fact that pth files, which are loaded at startup and implemented via
the site module[1], can include Python code that will be executed when
the pth file is evaluated.

Note that pth files were originally developed to just add additional
directories to sys.path, but they may also contain lines which start
with "import", which will be passed to exec(). Users have exploited this
feature to allow the customizations that they needed. See setuptools [2]
or betterexceptions[3] as examples.

Using pth files for this purpose is far from ideal for library
developers, as they need to inject code into a single line preceded by
an import, making it rather unreadable. Library developers following
that practice will usually create a module that performs all actions on
import, as done by betterexceptions[4], but the approach is still not
really user friendly.

Additionally, it is also non-ideal for users of the interpreter if they
want to inspect what is being executed at Python startup as they need to
review all the pth files for potential code execution which can be
spread across all site paths. Most of those pth files will be
"legitimate" pth files that just modify the path, answering the question
of "what is changing my interpreter at startup" a rather complex one.

Lastly, there have been multiple suggestions for removing code execution
from pth files, see[5] and[6].

Limitations of sitecustomize.py

Whilst sitecustomize is an acceptable solution, it assumes a single
person is in charge of the system and the interpreter. If both the
system administrator and the responsibility of provisioning the
interpreter want to add customizations at the interpreter startup they
need to agree on the contents of the file and combine all the changes.
This is not a major limitation though, and it is not the main driver of
this change. Should the change happen, it will also improve the
situation for these users, as rather than having a sitecustomize.py
which performs all those actions, they can have custom isolated files
named after the features they want to enhance. As an example, Ubuntu
could change their current sitecustomize.py to just be
ubuntu_apport_python_hook. This not only better represents its intent
but also gives users of the interpreter a better understanding of the
modifications happening on their interpreter.

Rationale

This PEP proposes supporting extensible customization of the interpreter
at startup by executing all files discovered in directories named
__sitecustomize__ in sitepackages[7] or usersitepackages[8] at startup
time.

Why __sitecustomize__

The name aims to follow the already existing concept of
sitecustomize.py. As the directory will be within sys.path, given that
it is located in site paths, we choose to use double underscore around
its name, to prevent colliding with the already existing
sitecustomize.py.

Discovering the new __sitecustomize__ directories

The Python interpreter will look at startup for directory named
__sitecustomize__ within any of the standard site-packages path.

These are commonly the Python system location and the user location, but
are ultimately defined by the site module logic.

Users can use site.sitepackages[9] and site.usersitepackages[10] to know
the paths where the interpreter can discover __sitecustomize__
directories.

Time of __sitecustomize__ discovery

The __sitecustomize__ directories will be discovered exactly after pth
files are discovered in a site-packages path as part of site.addsitedir
[11].

These is repeated for each of the site-packages path in the exact same
order that is being followed today for pth files.

Order of execution within __sitecustomize__

The implementation will execute the files within __sitecustomize__ by
sorting them by name when discovering each of the __sitecustomize__
directories. We discourage users to rely on the order of execution
though.

We considered executing them in random order, but that could result in
different results depending on how the interpreter chooses to pick up
those files. So even if it won't be a good practice to rely on other
files being executed, we think that is better than having randomly
different results on interpreter startup. We chose to run the files
after the pth files in case a user needs to add items to the path before
running a files.

Interaction with pth files

pth files can be used to add paths into sys.path, but this should not
affect the __sitecustomize__ discovery process, as those directories are
looked up exclusively in site-packages paths.

Execution of files within __sitecustomize__

When a __sitecustomize__ directory is discovered, all of the files that
have a .py extension within it will be read with io.open_code and
executed by using exec[12].

An empty dictionary will be passed as globals to the exec function to
prevent unexpected interactions between different files.

Failure handling

Any error on the execution of any of the files will not be logged unless
the interpreter is run in verbose mode and it should not stop the
evaluation of other files. The user will receive a message in stderr
saying that the file failed to be executed and that verbose mode can be
used to get more information. This behaviour mimics the one existing for
sitecustomize.py.

Interaction with virtual environments

The customizations applied to an interpreter via the new
__sitecustomize__ solutions will continue to work when a user creates a
virtual environment the same way that sitecustomize.py interact with
virtual environments.

This is a difference when compared to pth files, which are not
propagated into virtual environments unless include-system-site-packages
is enabled.

If library maintainers have features installed via __sitecustomize__
that they do not want to propagate into virtual environments, they
should detect if they are running within a virtual environment by
checking sys.prefix == sys.base_prefix. This behavior is similar to
packages that modify the global sitecustomize.py.

Interaction with sitecustomize.py and usercustomize.py

Until removed, sitecustomize and usercustomize will be executed after
__sitecustomize__ similar to pth files. See the Backward compatibility
section for information on removal plans for sitecustomize and
usercustomize.

Identifying all installed files

To facilitate debugging of the Python startup, if the site module is
invoked it will print the __sitecustomize__ directories that will be
discovered on startup.

Files naming convention

Packages will be encouraged to include the name of the package within
the name of the file to avoid collisions between packages. But the only
requirement on the filename is that it ends in .py for the interpreter
to execute them.

Disabling start files

In some scenarios, like when the startup time is key, it might be
desired to disable this option altogether. The already existing flag
-S[13] will disable all site-related manipulation, including this new
feature. If the flag is passed in, __sitecustomize__ directories will
not be discovered.

Additionally, to allow for starting the interpreter disabling only this
new feature a new option will be added under -X: disablesitecustomize,
which will disable the discovery of __sitecustomize__ exclusively.

Lastly, the user can disable the discovery of __sitecustomize__
directories only in the user site by disabling the user site via any of
the multiple options in the site.py module.

Support in build backends

Whilst build backends can choose to provide an option to facilitate the
installation of these files into a __sitecustomize__ directory, this PEP
does not address that directly. Similar to pth files, build backends can
choose to not provide an easy-to-configure mechanism for
__sitecustomize__ files and let users hook into the installation process
to include such files. We do not think build backends enhanced support
as a requirement for this PEP.

Impact on startup time

A concern in this implementation is how Python interpreter startup time
can be affected by this addition. We expect the performance impact to be
highly coupled to the logic in the files that a user or sysadmin
installs in the Python environment being tested.

If the interpreter has any files in their __sitecustomize__ directory,
the file execution time plus a call reading the code will be added to
the startup time. This is similar to how code execution is impacting
startup time through sitecustomize.py, usercustomize.py and code in pth
files. We will therefore focus here on comparing this solution against
those three, as otherwise the actual time added to startup is highly
dependent on the code that is being executed in those files.

Results were gathered by running "./python.exe -c pass" with perf on 50
iterations, repeating 50 times the command on each iteration and getting
the geometric mean of all the results. The file used to run those
benchmarks is checked in in the reference implementation[14].

The benchmark was run with 3.10 alpha 7 compiled with PGO and LTO with
the following parameters and system state:

-   Perf event: Max sample rate set to 1 per second
-   CPU Frequency: Minimum frequency of CPU 17,35 set to the maximum
    frequency
-   Turbo Boost (MSR): Turbo Boost disabled on CPU 17: MSR 0x1a0 set to
    0x4000850089
-   IRQ affinity: Set default affinity to CPU 0-16,18-34
-   IRQ affinity: Set affinity of IRQ
    1,3-16,21,25-31,56-59,68-85,87,89-90,92-93,95-104 to CPU 0-16,18-34
-   CPU: use 2 logical CPUs: 17,35
-   Perf event: Maximum sample rate: 1 per second
-   ASLR: Full randomization
-   Linux scheduler: Isolated CPUs (2/36): 17,35
-   Linux scheduler: RCU disabled on CPUs (2/36): 17,35
-   CPU Frequency: 0-16,18-34=min=1200 MHz, max=3600 MHz;
    17,35=min=max=3600 MHz
-   Turbo Boost (MSR): CPU 17,35: disabled

The code placed to be executed in pth files, sitecustomize.py,
usercustomize.py and files within __sitecustomize__ is the following:

  import time; x = time.time() ** 5

The file is aimed at execution a simple operation but still expected to
be negligible. This is to put the experiment in a situation where we
make visible any hit on performance due to the mechanism whilst still
making it relatively realistic. Additionally, it starts with an import
and is a single line to be able to be used in pth files.

+--------------------------+--------------------------+--------------+
| Test                     | # of files               | Time (us)    |
+==========================+==========================+==============+
|   #                      | sitecustomize.py         | Run 1 Run 2  |
|                          | usercustomize.py pth     |              |
|                          | __sitecustomize__        |              |
+--------------------------+--------------------------+--------------+
| ----------------------   | ====================     | ====== ===== |
|                          | ====================     |              |
|                          | =======                  |              |
|                          | =====================    |              |
+--------------------------+--------------------------+--------------+
|   1                      | 0 0 0 Dir not created    | 13884 13897  |
+--------------------------+--------------------------+--------------+
|   2                      | 0 0 0 0                  | 13871 13818  |
+--------------------------+--------------------------+--------------+
|   3                      | 0 0 1 0                  | 13964 13924  |
+--------------------------+--------------------------+--------------+
|   4                      | 0 0 0 1                  | 13940 13939  |
+--------------------------+--------------------------+--------------+
|   5                      | 1 1 0 0                  | 13990 13993  |
+--------------------------+--------------------------+--------------+
|   6                      | 0 0 0 2 (system + user)  | 14063 14040  |
+--------------------------+--------------------------+--------------+
|   7                      | 0 0 50 0                 | 16011 16014  |
+--------------------------+--------------------------+--------------+
|   8                      | 0 0 0 50                 | 15456 15448  |
+--------------------------+--------------------------+--------------+

Results can be reproduced with run-benchmark.py script provided in the
reference implementation[15].

We interpret the following from these results:

-   Using two __sitecustomize__ scripts compared to sitecustomize.py and
    usercustomize.py slows down the interpreter by 0.3%. We expect this
    slowdown until sitecustomize.py and usercustomize.py are removed in
    a future release as even if the user does not create the files, the
    interpreter will still attempt to import them.
-   With the arbitrary 50 pth files with code tested, moving those to
    __sitecustomize__ produces a speedup of ~3.5% in startup. Which is
    likely related to the simpler logic to evaluate __sitecustomize__
    files compared to pth file execution.
-   In general all measurements show that there is a low impact on
    startup time with this addition.

Audit Event

A new audit event will be added and triggered on __sitecustomize__
execution to facilitate security inspection by calling sys.audit [16]
with "sitecustimze.exec_file" as name and the filename as argument.

Security implications

This PEP aims to move all code execution from pth files to files within
a __sitecustomize__ directory. We think this is an improvement to system
admins for the following reasons:

-   Allows to quickly identify the code being executed at startup time
    by the interpreter by looking into a single directory rather than
    having to scan all pth files.
-   Allows to track usage of this feature through the new proposed audit
    event.
-   Gives finer grain control by allowing to tune permissions on the
    __sitecustomize__ directory, potentially allowing users to install
    only packages that does not change the interpreter startup.

In short, whilst this allows for a malicious users to drop a file that
will be executed at startup, it's an improvement compared to the
existing pth files.

How to teach this

This can be documented and taught as simple as saying that the
interpreter will try to look for the __sitecustomize__ directory at
startup in its site paths and if it finds any files with .py extension,
it will then execute it one by one.

For system administrators and tools that package the interpreter, we can
now recommend placing files in __sitecustomize__ as they used to place
sitecustomize.py. Being more comfortable on that their content won't be
overridden by the next person, as they can provide with specific files
to handle the logic they want to customize.

Library developers should be able to specify a new argument on tools
like setuptools that will inject those new files. Something like
sitecustomize_files=["scripts/betterexceptions.py"], which allows them
to add those. Should the build backend not support that, they can
manually install them as they used to do with pth files. We will
recommend them to include the name of the package as part of the file's
name.

Backward compatibility

This PEP adds a deprecation warning on sitecustomize.py,
usercustomize.py and pth code execution in 3.11, 3.12 and 3.13. With
plans on removing those features by 3.14. The migration from those
solutions to __sitecustomize__ should ideally be just moving the logic
into a different file.

Whilst the existing sitecustomize.py mechanism was created targeting
System Administrators that placed it in a site path, the file could be
actually placed anywhere in the path at the time that the interpreter
was starting up. The new mechanism does not allow for users to place
__sitecustomize__ directories anywhere in the path, but only in site
paths. System administrators can recover a similar behavior to
sitecustomize.py by adding a custom file in __sitecustomize__ which just
imports sitecustomize as a migration path.

Reference Implementation

An initial implementation that passes the CPython test suite is
available for evaluation[17].

This implementation is just for the reviewer to play with and check
potential issues that this PEP could generate.

Rejected Ideas

Do nothing

Whilst the current status "works" it presents the issues listed in the
motivation. After analyzing the impact of this change, we believe it is
worth it, given the enhanced experience it brings.

Formalize using pth files

Another option would be to just glorify and document the usage of pth
files to inject code at startup code, but that is a suboptimal
experience for users as listed in the motivation.

Making __sitecustomize__ a namespace package

We considered making the directory a namespace package and just import
all the modules within it, which allowed searching across all paths in
sys.path at initialization time and provided a way to declare
dependencies between files by importing each other. This was rejected
for multiple reasons:

1.  This was unnecessarily broadening the list of paths where arbitrary
    files are executed.
2.  The logic brought additional complexity, like what to do if a
    package were to install an __init__.py file in one of the locations.
3.  It's cheaper to search for __sitecustomize__ as we are looking for
    pth files already in the site paths compared to performing an actual
    import of a namespace package.

Support for shutdown customization

init.d users might be tempted to implement this feature in a way that
users could also add code at shutdown, but extra support for that is not
needed, as Python users can already do that via atexit.

Using entry_points

We considered extending the use of entry points to allow specifying
files that should be executed at startup but we discarded that solution
due to two main reasons. The first one being impact on startup time.
This approach will require scanning all packages distribution
information to just execute a handful of files. This has an impact on
performance even if the user is not using the feature and such impact
growths linearly with the number of packages installed in the
environment. The second reason was that the proposed implementation in
this PEP offers a single solution for startup customization for packages
and system administrators. Additionally, if the main objective of entry
points is to make it easy for libraries to install files at startup,
that can still be added and make the build backends just install the
files within the __sitecustomize__ directory.

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

Acknowledgements

Thanks Pablo Galindo for contributing to this PEP and offering his PC to
run the benchmark.

References

[1] https://docs.python.org/3/library/site.html

[2] https://github.com/pypa/setuptools/blob/b6bbe236ed0689f50b5148f1172510b975687e62/setup.py#L100

[3] https://github.com/Qix-/better-exceptions/blob/7b417527757d555faedc354c86d3b6fe449200c2/better_exceptions_hook.pth#L1

[4] https://github.com/Qix-/better-exceptions/blob/7b417527757d555faedc354c86d3b6fe449200c2/better_exceptions_hook.pth#L1

[5] https://bugs.python.org/issue24534

[6] https://bugs.python.org/issue33944

[7] https://docs.python.org/3/library/site.html?highlight=site#site.getsitepackages

[8] https://docs.python.org/3/library/site.html?highlight=site#site.getusersitepackages

[9] https://docs.python.org/3/library/site.html?highlight=site#site.getsitepackages

[10] https://docs.python.org/3/library/site.html?highlight=site#site.getusersitepackages

[11] https://github.com/python/cpython/blob/5787ba4a45492e232f5470c7d2e93763198e4b22/Lib/site.py#L207

[12] https://docs.python.org/3/library/functions.html#exec

[13] https://docs.python.org/3/using/cmdline.html#id3

[14] https://github.com/mariocj89/cpython/tree/pu/__sitecustomize__

[15] https://github.com/mariocj89/cpython/tree/pu/__sitecustomize__

[16] https://docs.python.org/3/library/sys.html#sys.audit

[17] https://github.com/mariocj89/cpython/tree/pu/__sitecustomize__