PEP 776 – Emscripten Support
- Author:
- Hood Chatham <roberthoodchatham at gmail.com>
- Sponsor:
- Łukasz Langa <lukasz at python.org>
- Discussions-To:
- Discourse thread
- Status:
- Draft
- Type:
- Informational
- Created:
- 18-Mar-2025
- Python-Version:
- 3.14
- Post-History:
- 18-Mar-2025, 28-Mar-2025
Abstract
Emscripten is a complete open source compiler toolchain. It compiles C/C++ code into WebAssembly/JavaScript executables, for use in JavaScript runtimes, including browsers and Node.js. The Rust language also maintains an Emscripten target.
This PEP formalizes the addition of Tier 3 for Emscripten support in Python 3.14 which was approved by the Steering Council on October 25, 2024. The goals are:
- To describe the current state of the CPython Emscripten runtime
- To describe the current state of the Pyodide runtime
- To identify minor features to be upstreamed from the Pyodide runtime into the CPython Emscripten runtime
The minor features identified here are all features that could be implemented without a PEP. We discuss more significant runtime features that we would like to implement but we defer decisions on those features to subsequent PEPs.
Motivation
A web browser is a universal computing platform, available on Windows, macOS, Linux, and every smartphone.
The Pyodide project has supported Emscripten Python since 2018. Hundreds of thousands of students have learned Python through Pyodide via projects like Capytale and PyodideU. Pyodide is also increasingly being used by Python packages to provide interactive documentation. This demonstrates both the importance and the maturity of the Emscripten platform.
Emscripten and WASI are also the only supported platforms that offer any meaningful sandboxing.
Emscripten Platform Information
“Pyodide” vs “Emscripten Python”
For the sake of this document, we use the term “Emscripten Python” to refer to
the Emscripten Python maintained in the python/cpython
repository, without
any downstream additions. We contrast the features present in Emscripten Python
to the features present in Pyodide.
Pyodide is maintained on GitHub and distributed via jsDelivr, npm, and GitHub releases.
Emscripten Python is not distributed, but it is possible to build by following the instructions in the devguide
Background on Emscripten
Emscripten consists of a C and C++ compiler and linker based on LLVM, together with a runtime based on a mildly patched musl libc.
Emscripten is a POSIX-based platform. It uses the WebAssembly binary format, and the WebAssembly dynamic linking section.
The emcc
compiler is a wrapper around clang
. The emcc
linker is a
wrapper around wasm-ld
(also part of the LLVM toolchain).
Emscripten support for portable C/C++ code source compatibility with Linux is fairly comprehensive, with certain expected exceptions to be spelled out. CPython already supports compilation to Emscripten, and it only requires a very modest number of modifications to the normal Linux target.
POSIX Compliance
Emscripten is a POSIX platform. However, there are POSIX APIs that exist but
always fail when called and POSIX APIs that don’t exist at all. In particular,
there are problems with networking APIs and blocking I/O, and there is no
support for fork()
. See Emscripten Portability Guidelines.
Emscripten executables can be linked with threading support, but it comes with several limitations:
- Enabling threading requires websites to be served with special security headers that indicate acceptance of the possibility of Spectre-style information leakage. These headers are a usability hazard for users who are not intimately familiar with the web platform.
- If an executable is linked with both threading and a dynamic loader, Emscripten prints a warning that using dynamic loading and pthreads together is experimental. It may cause performance problems or crashes. These problems may require WebAssembly standards work to resolve.
Because of these limitations, Pyodide standardizes a no-pthreads build of Python. If there is sufficient demand, a pthreads build with no dynamic loader could be added later.
Development Tools
Emscripten development tools are equally well supported on Linux, Windows, and macOS. The upstream tools include:
- The Emscripten Software Developer Kit (emsdk) which can be used to install the Emscripten compiler toolchain (emcc).
- emcc is a C and C++ compiler, linker, and a sysroot with headers for the system libraries. The system libraries themselves are generated on the fly based on the ABI requested.
- Node.js can be used as an “emulator” to run Emscripten programs from the command line. This emulation behaves best on Linux with macOS as a runner up. Node.js is the most convenient way to test Emscripten programs.
- It is possible to run Emscripten programs inside of any web browser. Browser automation tools like Selenium, Playwright, or Puppeteer can be used to test features that are browser-only.
Pyodide’s tools:
pyodide build
can be used to cross compile Python packages to run on Emscripten. Cross compilation works best on Linux, there is experimental support on macOS, and it is entirely unsupported on Windows.pyodide venv
can make a virtual environment that runs in Pyodide.pytest-pyodide
can test Python code against various JavaScript runtimes.
cibuildwheel supports building wheels to target Emscripten using pyodide build
.
In the short term, Pyodide’s packaging tooling will stay in the Pyodide
repository. It is an open question where Pyodide’s packaging tooling should live
in the long term. Two sensible options would be for it to remain under the
pyodide
organization or be moved into the pypa
organization on GitHub.
Emscripten Application Lifecycle
An Emscripten “binary” consists of a pair of files, an .mjs
file and a
.wasm
file. The .wasm
file contains all of the compiled C/C++/Rust code.
The .mjs
file contains the lifecycle code to set up the runtime, locate the
.wasm
file, compile it, instantiate it, call the main()
function, and to
shut down the runtime on exit. It also includes an implementation for all of the
system calls, including the file system, the dynamic loader, and any logic to
expose additional functionality from the JavaScript runtime to C code.
The .mjs
file exports a single bootstrapEmscriptenExecutable()
JavaScript function that bootstraps the runtime, calls the main()
function,
and returns an API object that can be used to call C functions. Each time it is
called produces a complete and independent copy of the runtime with its own
separate address space.
The bootstrapEmscriptenExecutable()
takes a large number of runtime
settings. The full list is described in the Emscripten documentation here. The most
important of these are as follows:
thisProgram
: The value ofargv[0]
. In Python, this makes its way intosys.executable
.arguments
: The list of string arguments to be passed tomain()
.preRun
: A list of callbacks which are invoked after the JavaScript runtime and file system have been bootstrapped but before callingmain()
. Useful to set up the file system, environment variables, and standard streams.print
/printErr
: Initial handlers for stdout and stderr. They are line buffered and performing aflush()
of a partial line forces an extra new line. If tty-like behavior is desired, the standard stream devices should be replaced in apreRun()
hook.onExit
: A handler that is called when the runtime exits.instantiateWasm
: A callback that is called to instantiate the WebAssembly module. Overriding the WebAssembly instantiation procedure via this function is useful when you have other custom asynchronous startup actions or downloads that can be performed in parallel to WebAssembly compilation. Implementing this callback allows performing all of these in parallel.
File System Setup
The Standard Library
In order for Python to run, it needs access to the standard library in the Emscripten file system. There are several possible approaches to this:
- The Emscripten linker has a
--preload-file
flag that will automatically handle loading files. Information about how it works is available here. This is the simplest approach, but Pyodide has moved away from it because it embeds the files into a custom archive format that cannot be processed with standard tooling. - For Node.js, use the NODEFS to mount a native directory with the files into the Emscripten file system. This is the most efficient option but is Node only. It is closely analogous to what WASI does.
- Put the standard library into a zip archive and use
ZipImporter
. Using an uncompressed zip file allows the web server and client to apply better compression to the standard library itself. It also uses the more efficient native decompression algorithms of the browser rather than less efficient WebAssembly decompression. The disadvantages of this are a higher memory footprint and breakinginspect
& various tests that do not expect the standard library to be packaged in this way. - Put the standard library into an uncompressed tar archive and mount it into a TARFS read only file system backed by the tar file. This has the best memory usage, runtime performance, and transfer size of the options that can be used in the browser. The disadvantage is that Emscripten does not itself include a TARFS so it requires a downstream implementation.
Pyodide uses the ZipImporter
approach in every runtime. Python uses the
NODEFS approach when run with node and the ZipImporter
approach for the web
example. We will continue with this approach.
The ZipImporter
provides a clean resolution for a bootstrapping problem: the
Python runtime is capable of unpacking a wide variety of archive formats, but
the Python runtime is not ready to use until the standard library is already
available. Since zipimport.py
is a frozen module, it avoids these problems.
All of the other approaches solve the bootstrapping problem by setting up the
standard library using JavaScript.
Third-party packages
It is also necessary to make any needed packages available in the Emscripten file system. Currently Emscripten CPython has no support for packages. Pyodide uses two different approaches for packages:
- In the browser, Pyodide downloads and unpacks wheels into the MEMFS site-packages directory. It then preloads all dynamic libraries in the wheel. The work of downloading and installing all the packages is redone every time the runtime starts.
- The Pyodide
python
CLI entrypoint mounts all of the host file system as NODEFS directories before it bootstraps Python. This allows the normal virtual environment mechanism to work. Pyodide virtual environments contain a patched copy of pip and a custompip.conf
so that pip will install Pyodide wheels. On startup the Pyodidepython
CLI will preload all Emscripten dynamic libraries that are in the site-packages directory.
Console and Interactive Usage
stdin
defaults to always returning EOF
, while stdout
and stderr
default to calling console.log
and console.error
respectively. It is
possible to pass handlers to bootstrapEmscriptenExecutable()
to configure
the standard streams, but no matter what the I/O devices have undesirable line
buffering behavior that forces a new line when flushed. To implement a well
behaved TTY in-browser, it is necessary to remove the default I/O devices and
replace them in a preRun
hook.
Making stdin
work correctly in the browser poses an additional challenge
because it is not allowed to block for user input in the main thread of the
browser. If Emscripten is run in a web worker and served with the shared memory
headers, it is possible to receive input using shared memory and atomics. It is
also possible for a stdin
device to block in a simpler and more efficient
manner using stack switching using the experimental JavaScript Promise
Integration API.
Pyodide replaces the standard I/O devices in order to fix the line buffering
behavior. When Pyodide is run in Node.js, stdin
, stdout
, and stderr
are
by default connected to process.stdin
, process.stdout
, and
process.stderr
and so the standard streams work as a tty out of the box.
Pyodide also ensures that shutil.get_terminal_size
returns results
consistent with process.stdout.rows
and process.stdout.columns
. Pyodide
currently has no support for stack switching stdin
.
Currently, the Emscripten Python Node.js runner uses the default I/O that
Emscripten provides. The web example uses Atomics
for stdin
and has
custom stdout
and stderr
handlers, but they exhibit the undesirable line
buffering behavior. We will upstream the standard streams behaviors from
Pyodide.
In the long term, we hope to implement stack switching stdin
devices, but
that is out of scope for this PEP.
Missing RPATH Support
An important limitation of the Emscripten dynamic loader is that it does not
currently have RPATH support. Pyodide’s present workaround is as follows:
auditwheel-emscripten
places shared library dependencies that are vendored
into a package in a ${package}.libs
folder, following auditwheel’s
convention. Pyodide patches the dynamic loader to treat this ${package}.libs
folder as if it were on the RPATH of all of the dynamic libraries in the wheel.
In Emscripten 4.0.5, we have updated the shared object file format, wasm-ld
and emcc
to accept an -rpath
flag. We are still working on updating the
dynamic loader to respect the rpath, but we expect this will be finished soon.
Pyodide will then switch to using the RPATH and drop the patch on the dynamic
loader.
Emscripten Python currently uses the unpatched dynamic loader and so cannot load
extension modules that depend on vendored dynamic libraries via DT_NEEDED.
Extension modules can load dynamic libraries via DT_NEEDED if they are in the
system lib
directory. We will wait to resolve this until we have fixed the
Emscripten dynamic loader upstream. When Emscripten Python is built with a
compatible version of Emscripten, it will automatically pick up support for
wheels with vendored dynamic libraries.
Traps and Uncaught Exceptions
We consider the C runtime state to be corrupted if there is a WebAssembly trap, an unhandled JavaScript exception, or an uncaught WebAssembly throw instruction.
Unlike in other platforms, there is no operating system to shut down the executable when there is a trap or other unrecoverable corruption of the libc runtime. We need to provide our own code to print tracebacks, dump the memory, or do whatever else is helpful for debugging a crash. If we expose a JavaScript API, we also must ensure that it is disabled after an unrecoverable crash to prevent downstream users from observing the Python runtime in an inconsistent state.
In order to detect fatal errors, Pyodide uses the following approach: all
fallable calls from WebAssembly into JavaScript are wrapped with a JavaScript
try/catch block. Any caught JavaScript exceptions are translated into Python
exceptions. This ensures that any recoverable JavaScript error is caught before
it unwinds through any WebAssembly frames. All entrypoints to WebAssembly are
also wrapped with JavaScript try/catch blocks. Any exceptions caught there have
unwound WebAssembly frames and are thus considered to be fatal errors (though
there is a special case to handle exit()
). This requires foundational
integration with the Python/JavaScript foreign function interface.
When the Pyodide runtime catches a fatal exception, it introspects the error to
determine whether it came from a trap, a logic error in a system call, a
setjmp()
without a longjmp()
, or a libcxxabi call to __cxa_throw()
(an uncaught C++ exception or Rust panic). We render as informative an error
message as we can. We also call _Py_DumpTraceback()
so we can display a
Python traceback in addition to the JS/WebAssembly traceback. It also disables
the JavaScript API so that further attempts to call into Python result in an
error saying that the runtime has fatally failed.
Normally, WebAssembly symbols are stripped so the WebAssembly frames are not
very useful. Compiling and linking with -g2
(or a higher debug setting)
ensures that WebAssembly symbols are included and they will appear in the
traceback.
Because Emscripten Python currently has no JavaScript API and no foreign function
interface, the situation is much simpler. The Python Node.js runner wraps the call
to bootstrapEmscriptenExecutable()
in a try/catch block. If an exception is
caught, it displays the JavaScript exception and calls _Py_DumpTraceback()
.
It then exits with code 1. We will stick with this approach until we add either
a JavaScript API or foreign function interface, which is out of scope for this PEP.
Specification
Scope of Work
Adding Emscripten as a Tier 3 platform only requires adding support for compiling an Emscripten-compatible build from the unpatched CPython source code. It does not necessarily require there to be any officially distributed Emscripten artifacts on python.org, although these could be added in the future. In the short term, they will continue to be distributed downstream with Pyodide.
Emscripten will be built using the same configure and Makefile system as other POSIX platforms, and must therefore be built on a POSIX platform. Both Linux and macOS will be supported.
A Python CLI entrypoint will be provided, which among other things can be used to execute the test suite.
Linkage
It is only supported to statically link the Python interpreter. We use EM_JS
functions in the interpreter for various purposes. It is possible to dynamically
link object files that include EM_JS
functions, but their behavior deviates
significantly from their behavior in static builds. For this reason, it would
require special work to support. If a use case for dynamically linking the
interpreter in Emscripten emerges, we can evaluate how much effort would be
required to support it.
Standard Library
Unsupported Modules
See https://pyodide.org/en/stable/usage/wasm-constraints.html#removed-modules.
Removed Modules
The following modules are removed from the standard library to reduce download size and since they currently wouldn’t work in the WebAssembly VM.
- curses
- dbm
- ensurepip
- fcntl
- grp
- idlelib
- msvcrt
- pwd
- resource
- syslog
- termios
- tkinter
- turtle
- turtledemo
- venv
- winreg
- winsound
Included but not Working Modules
The following modules can be imported, but are not functional:
- multiprocessing
- threading
- sockets
as well as any functionality that requires these.
The following are present but cannot be imported due to a dependency on the termios module which has been removed:
- pty
- tty
Platform Identification
sys.platform
will return "emscripten"
. Although Emscripten attempts to
be compatible with Linux, the differences are significant enough that a distinct
name is justified. This is consistent with the return value from os.uname()
.
There is also sys._emscripten_info
which includes the Emscripten version and
the runtime (either navigator.userAgent
in a browser or "Node js" +
process.version
in Node.js).
Signals Support
WebAssembly does not have native support for signals. Furthermore, on a non-pthreads build, the address space of the WebAssembly module is not shared, so it is impossible for any thread capable of seeing an interrupt to write to the eval breaker while the Python interpreter is running code. To work around this, there are two possible solutions:
- If Emscripten is run in a web worker and served with the shared memory headers, it is possible to use shared memory outside of the WebAssembly address space as a signal buffer. A signal handling UI thread can write the desired signal into the signal buffer. The interpreter can periodically check the state of this signal buffer in the eval breaker code. Checking the signal buffer is slow compared to checking the eval breaker in native platforms, so we do only do it once every 50 times through the eval breaker. See Python/emscripten_signal.c
- Using stack switching, we can occasionally switch the stack and allow the JavaScript event loop to go around, then check the state of a signal buffer. This requires the experimental JavaScript Promise Integration API, and would be best used with the techniques for optimizing long tasks described in this article
Emscripten Python has already implemented the solution based on shared memory, and it is in use in Pyodide.
Eventually, we hope to implement stack-switching-based signals so that it is possible to use signals in the main thread of node and the browser, as well as in in web pages that are not served with the shared memory headers. We will need to keep the shared memory based approach as well, both for backwards compatibility and because it is more efficient when it is possible. However, this is out of scope for this PEP.
Function Pointer Casts
Section 6.3.2.3, paragraph 8 of the C standard reads:
A pointer to a function of one type may be converted to a pointer to a function of another type and back again; the result shall compare equal to the original pointer. If a converted pointer is used to call a function whose type is not compatible with the pointed-to type, the behavior is undefined.
However, most platforms have the same behavior: if a function is called with too many arguments, the extra arguments are ignored; if a function is called with too few arguments, the extra arguments are filled in with garbage.
On the other hand, the WebAssembly spec defines calling a function with the wrong signature to trap (see step 18 in the execution of call_indirect.
It is common for Python extension modules to cast a function to a different
signature and call it with the different signature. For instance, many C
extensions define a METH_NOARGS
function to take 0 or 1 argument. The
interpreter calls it with two arguments, the first of which is the Python module
object and the second of which is always NULL
. In order to make these
extension modules work without changing their source code, we need special
handling.
Initially, we resolved this problem by calling out to JavaScript and having JavaScript call the function pointer. When calling a WebAssembly function from JavaScript, missing arguments are treated as zero and extra arguments are ignored (see step 7 here. This works, but has the disadvantage of being slow and breaking stack switching – it is not possible to stack switch through JavaScript frames.
Using the wasm-gc ref.test instruction, we can query the type of the function pointer and manually fix up the argument list.
wasm-gc is a relatively new feature for WebAssembly runtimes, so we attempt to use a wasm-gc based function pointer cast trampoline if possible and fall back to a JS trampoline if not. Every JavaScript runtime that supports stack switching also supports wasm-gc, so this ensures that stack switching works on every platform runtime that supports it. The one wrinkle is that iOS 18 ships a broken implementation of wasm-gc so we have to special case it.
See here for the full implementation details.
The function pointer cast handling is fully implemented in cpython. Pyodide uses exactly the same code as upstream.
CI Resources
Pyodide can be built and tested on any Linux with a reasonably recent version of Node.js. Anaconda has offered to provide physical hardware to run Emscripten buildbots, maintained by Russell Keith-Magee.
CPython does not currently test Tier 3 platforms on GitHub Actions, but if this ever changes, their Linux runners are able to build and test Emscripten Python.
PEP 11
PEP 11 will be updated to indicate that Emscripten is supported, specifically
the triples wasm32-unknown-emscripten_xx_xx_xx
.
Russell Keith-Magee will serve as the initial core team contact for these ABIs.
Future Work
Improving Cross Builds in the Packaging Ecosystem
Python now supports four non-self-hosting platforms: iOS, Android, WASI, and
Emscripten. All of them will need to build packages via cross builds. Currently,
pyodide-build
allows building a very large number of Python packages for
Emscripten, but it is very complicated. Ideally, the Python packaging ecosystem
would have standards for cross builds. This is a difficult long term project,
particularly because the packaging system is complex and was designed from the
ground up with the assumption that cross compilation would not happen.
Pyodide Runtime Features to be Upstreamed
This is a collection of Pyodide runtime features that are out of scope for this PEP and for the Python 3.14 development cycle but we would like to upstream in the future.
JavaScript API for Bootstrapping
Currently we offer no stable API for bootstrapping Python. Instead, we use one collection of settings for the Node.js CLI entrypoint and a separate collection of settings for the browser demo.
The Emscripten executable startup API is complicated and there are many possible configurations that are broken. Pyodide offers a simpler set of options than Emscripten. This gives downstream users a lot of flexibility while allowing us to maintain a small number of tested configurations. It also reduces downstream code duplication.
Eventually, we would like to upstream Pyodide’s bootstrapping API. In the short term, to keep things simple we will support no JavaScript API.
JavaScript foreign function interface (FFI)
Because Emscripten supports POSIX, a significant number of tasks can be achieved
using the os
module. However, many fundamental operations in JavaScript
runtimes are not possible via POSIX APIs. Pyodide’s approach is to specify a
mapping between the JavaScript object model and the Python object model and a
calling convention that allows high level bidirectional integration. See the
Pyodide documentation.
Asyncio
Most JavaScript primitives are asynchronous. The JavaScript thread that Python runs in already has an event loop. It it not too difficult to implement a Python event loop that defers all actual work to the JavaScript event loop, implemented in Pyodide here.
This is logically dependent on having at least some limited JavaScript FFI because the only way to schedule tasks on the JavaScript event loop is via a call out to JavaScript.
One cause of incompatibility is that it is not possible to control the life
cycle of the event loop from within a JavaScript isolate. This makes
asyncio.run()
and similar things not work.
Using stack switching it is also possible to make a coroutine out of “synchronous” Python frames. These stack switching coroutines are scheduled on the same event loop as ordinary Python coroutines and are fully reentrant. This is fully implemented in Pyodide.
Backwards Compatibility
Adding a new platform does not introduce any backwards compatibility concerns to CPython itself. However, there may be some backwards compatibility implications on Pyodide users. There are a large number of existing users of Pyodide, so it is important when upstreaming features from Pyodide into Python that we take care to minimize backwards incompatibility. We will also need a way to disable partially-upstreamed features so that Pyodide can replace them with more complete versions downstream.
Security Implications
Adding a new platform does not add any new security implications.
Emscripten and WASI are also the only supported platforms that offer sandboxing. If users wish to execute untrusted Python code or untrusted Python extension modules, Emscripten provides a secure way for them to do that.
How to Teach This
The education needs related to this PEP relate to two groups of developers.
First, web developers will need to know how to build Python and use it in a website, along with their own Python code and any supporting packages, and how to use them all at runtime. The documentation will cover this in a similar form to the existing Windows embeddable package. In the short term, we will encourage developers to use Pyodide if at all possible.
Reference Implementation
Pyodide.
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Source: https://github.com/python/peps/blob/main/peps/pep-0776.rst
Last modified: 2025-03-31 13:18:59 GMT