PEP 836 – JIT Go Brrr: The Path to a Supported JIT Compiler for CPython
- Author:
- Savannah Ostrowski <savannah at python.org>, Ken Jin <kenjin at python.org>, Brandt Bucher <brandt at python.org>
- Discussions-To:
- Pending
- Status:
- Draft
- Type:
- Standards Track
- Created:
- 02-Jul-2026
- Python-Version:
- 3.16
- Post-History:
- Pending
Abstract
The experimental Just-in-Time (JIT) compiler has been part of CPython’s main branch since Python 3.13. PEP 744 described part of its initial design and explicitly deferred a number of questions about the JIT’s long-term status. Since then, the JIT has been re-architected and matured considerably. In Python 3.15, it delivers a measurable, reproducible speedup over the interpreter (about 4-12% geometric mean performance improvement across measured Tier 1 platforms (see Appendix), emits frames that native debuggers can unwind through, and reduces the memory footprint of generated code relative to 3.14. Along the way, we have learned a good deal about what works for a JIT in CPython.
This PEP proposes a path for the JIT to become a supported, non-experimental part of CPython if it meets measurable performance, compatibility, tooling, platform, distribution, security, and maintenance goals. The initial performance target is at least 20% geometric mean improvement on pyperformance for the JIT + free-threaded build compared to the non-JIT free-threaded build’s interpreter, measured as the mean across the supported Tier 1 platforms, by the first beta release of Python 3.17. The target is set as a minimum bar for continued in-tree development of the JIT.
Proposal
This PEP does not propose declaring the JIT as supported immediately. Instead, it proposes a time-bounded path for keeping JIT development in CPython main while the project meets explicit performance, compatibility, tooling, distribution, security, and maintenance goals.
If these goals are met, the JIT can be promoted to a non-experimental feature of CPython. If they are not met, the Steering Council and core team should re-evaluate whether the JIT should remain in CPython main. After promotion, enabling the JIT by default on supported platforms would require a separate final approval from the Release Manager.
At a high level, these are our milestones and goals for the JIT over the next 2.5 years:
- Year 1 (ending with Python 3.16’s first beta) - Developer experience
improvements
- Evolve the frontend from trace recording to method-based. We believe that a method frontend will put us on a path that allows for easier maintenance, teachability, debugging, etc. The first implementation should be minimal, may initially use more memory or perform slightly worse, and may be rolled back to the current tracing frontend if the approach does not meet the project’s goals in the first year.
- Make the JIT compatible with free-threading. We believe that this is important to prioritize early on in the next phase of the JIT as free-threading adoption is expanding rapidly.
- Add further testing (and address any discovered remaining gaps in coverage) for native and Python profilers and debuggers. At a minimum, this will include anything that uses frame pointers to unwind, but should also be expanded to support tools that symbolize Python frames. Third-party tooling must have documented remediation paths when existing behavior cannot be preserved exactly.
- A better JIT distribution story. Provide redistributors with a documented and reproducible way to build or verify JIT stencils, without requiring long-term dependence on one exact LLVM version [10].
- No lower than 5% uplift for JIT + GIL versus the GIL interpreter alone. In other words, we will not significantly regress existing performance improvements while in pursuit of longer-term goals. We will also not discourage other contributors from contributing performance improvements during this stage. However, our main focus will be the developer experience improvements.
- Year 2 (ending with Python 3.17’s first beta) - Improved performance
- Achieve at least 20% performance geometric mean improvement on pyperformance for JIT + free-threading compared to the free-threading interpreter alone. This is the minimum target for keeping the JIT in CPython main, with free-threading treated as the primary performance focus.
- Year 2.5 (ending with Python 3.17’s first release candidate) - Adoption
and compatibility
- Compatibility review. Run test suites for selected popular PyPI packages and representative real-world workloads under the JIT. Regressions should be triaged case by case, with fixes or documented explanations for issues that reflect real compatibility breaks.
Motivation
Improving CPython’s performance is essential to Python’s future. A JIT compiler is one of the few performance strategies that can improve CPython while preserving the runtime that users, extension authors, embedders, distributors, debuggers and profilers already target. Other dynamic languages, such as Ruby, PHP and JavaScript, have successfully used JIT compilers to deliver substantial performance improvements while maintaining compatibility with large existing ecosystems. CPython has different constraints but the JIT is explicitly aimed at improving performance within those constraints.
Alternative Python implementations, such as PyPy and GraalPy, demonstrate that larger speedups are possible for some Python programs, and we deeply value those projects. However, many users cannot adopt an alternative runtime even when it may perform better on their code due to factors such as supported Python versions, extension compatibility, embedding requirements, deployment constraints, and tooling support.
PEP 744 did valuable work explaining the JIT’s copy-and-patch approach, made the case for keeping the implementation in CPython main branch so that it could be maintained by a broader group of volunteers, and sketched some criteria under which the JIT might eventually graduate from an experimental state. However, the original PEP for the JIT left many questions open about guarantees, maintenance commitments, success metrics, timelines, tooling compatibility, impact on redistributors, its relationship to other JITs, and likely architectural evolution.
The current CPython JIT has shown some promising results (see Appendix), especially in the last 9-12 months. However, as with any good experiment, it’s important to evaluate the current approach and evolve plans based on what we’ve learned.
Current State
As of CPython 3.15, the current JIT compiler is roughly 4-12% faster geometric mean on the pyperformance benchmark suite compared to the interpreter across measured Tier 1 platforms (see Appendix). In order to achieve this, the JIT and supporting infrastructure has undergone a number of revisions across the last four major versions of Python:
- 3.12: Introduction of the CPython bytecode DSL, and refactoring of interpreter bytecodes to micro-operations (“uops”).
- 3.13: JIT trace projector, optimizer and Copy and Patch backend introduced, PEP 744 written.
- 3.14: More refactoring of interpreter bytecodes and optimizer work.
- 3.15: JIT tracer rewritten to recording, JIT optimizer improvements, community engagement and involvement.
Today, the JIT is an experimental and opt-in part of CPython. The official
Python binaries for Windows and macOS ship with the JIT built but disabled by
default (end users can enable using PYTHON_JIT=1). Other distributors and
certain Linux distributions, such as Fedora and Gentoo, are also known to do
the same. As it stands, the JIT requires an LLVM build-time requirement for
stencil generation.
The JIT has garnered many excellent community contributors, and this has picked up momentum in recent months. We are extremely grateful to these volunteers. A sizable and active community now exists today, as evidenced by the contributor list in CPython 3.15’s What’s New entry for the JIT [1]. The JIT team has learnt important lessons to attract new contributors, such as making approachable work units in the public issue tracker, and mentorship.
Learnings
JIT projects evolve over their lifespan, as seen for example in CRuby which has seen multiple JIT compilers. CPython’s JIT is no different.
CPython’s JIT compiler has areas to improve. To be sustainable in the long-term, meet our performance goals, and continue fostering community engagement, certain tradeoffs are required. Tooling, compatibility with other JIT projects, and free-threading must be first-class citizens.
In our experience, there are several areas which we believe have been successful:
- The bytecode DSL and uops. This approach lowers maintenance burden even in the interpreter, as repeated units of code can be shared without interpretive overhead, and we can reduce error-proneness when modifying the interpreter through bytecode validation. These should remain in CPython even if the current JIT is unsuccessful and asked to be removed. The uops themselves also form the intermediate representation for the JIT automatically.
- Generating a JIT translator automatically using our own tooling. CPython’s JIT can automatically generate bytecode to JIT Intermediate Representation (IR) rules using the bytecodes DSL. Again, this means most JIT translations are correct-by-construction, reducing error-proneness and maintenance burden. This also means the complexity of the JIT is self-contained: new features added to CPython generally do not need JIT support unless implementers want the JIT to optimize the feature. For example, the initial lazy imports pull request [9] did not require touching any JIT files apart from adding new headers to include in C.
- A JIT optimizer that resembles the CPython interpreter. The current JIT optimizer (middle-end) analyzes type information over CPython uops. The key maintainability advantage here is that the middle-end is written in a similar fashion as the normal CPython interpreter – as a bytecode DSL over an interpreter. However, instead of interpreting objects, we interpret types of the objects. This means knowledge of the CPython interpreter is transferrable to the JIT optimizer, and if a contributor knows how to work on the interpreter’s bytecodes, they also know how to work on the JIT’s middle-end.
- Generating a JIT machine code backend automatically using our own tooling. CPython’s JIT does not require custom handwritten operations, as the JIT machine code is generated automatically from the interpreter. This further reduces the maintenance burden of a JIT and allows a small team to maintain it for a wide variety of platforms.
- Trace recording provides some benefits naturally. For example, polymorphism, speculation, dead code elimination, value recording are well handled in a trace recording JIT.
- Traces are an easy starting point. Traces don’t have control-flow within them, making analysis simpler.
- A community-maintained JIT. Despite partial funding from corporate sources (which we are grateful for), a sizable portion of JIT work comes from volunteers. Breaking the JIT into understandable chunks for contributors to work on is an effective way of compartmentalizing complexity and encouraging ownership.
Conversely, we have also learned quite a bit about what has not worked for CPython and what could be improved upon:
- We can continue to improve community outreach and engagement. We are taking active steps to onboard new members. However, continuous engagement with the wider community and their requests/needs is critical for the project. This includes talking to system distributors for example, or maintainers of third-party tooling, understanding their concerns, and accommodating them better.
- Reconsidering if our current JIT frontend is the right fit for CPython.
The current tracing JIT has some great benefits (see
Learnings). However, a successful JIT is much more than
just good performance, we must consider other factors like maintainability,
testability, and teachability too. Furthermore, as the JIT matures, the
cost-benefit proposition of tracing in CPython shifts. To be clear, this is
not a value judgement of tracing as an approach, but rather an assessment of
its state within CPython. These observations are from the authors, some of
whom implemented the current tracing frontend in CPython 3.15:
- Tracing’s initial ease does not seem to continue in the medium term in the case of CPython. As mentioned in Learnings, tracing is easy to start with. However the simple implementation of tracing in CPython yielded no speedups initially in 3.13 and 3.14. Only when we shifted to a more complex tracing runtime and modified the interpreter did we experience performance gains. We believe the initial ease of tracing will be eroded as our tracing runtime matures.
- A mature tracing runtime’s complexity seems to require many non-conventional clever “tricks”, in our experience. For example, the current trace recording mechanism relies on such tricks [8] to make recording the interpreter execution efficient and effective. Additional complexities include managing the trace graph and its lifetime. We would like to reduce such tricks to make the JIT easier to maintain and teach. Other frontends such as method-based ones also implement tricks. However, they seem to be more well-studied in recent years and thus more well-documented due to their prevalence in other dynamic language runtimes.
- Tracing’s interactions in CPython are nontraditional to teach and analyze. Tracing is commonly found in AI/ML compilers, but less frequently used and taught in traditional compiler literature. We believe this increases the barrier to entry for a new contributor who knows compilers but does not know about CPython. Furthermore, when a trace performs badly, its interactions with the CPython interpreter can be hard to analyze. There can be myriad reasons for less predictable performance, and analyzing them requires a deep understanding of the interpreter as well. For example, the current trace recording runtime has complicated tracing heuristics which decide whether to continue or terminate the trace. These heuristics took a contributor and one of this PEP’s authors many attempts to get right (through no fault of their own). We wish to make it easier to teach and onboard new contributors without requiring them to deeply analyze the interpreter.
- We do not have a strong pulse on whether the JIT currently benefits larger real-world workloads. At present, the JIT is primarily measured and evaluated via pyperformance benchmark suite runs. We would like to spend more time evaluating its impact on real end-user code (see Compatibility Review).
- The distribution story should be improved and codified. Distributor feedback so far suggests that LLVM itself is not always the main obstacle as most can provide a recent LLVM toolchain. The harder problem is depending on one exact LLVM version for the lifetime of a Python release, which can force redistributors to carry multiple LLVM versions, rely on unsupported toolchains, disable the JIT or maintain bespoke stencil generation workflows. We must codify a solution that is workable for distributors. PEP 774 is one such solution but more research needs to be done to prevent each Linux distribution rolling their own bespoke solution.
Rationale
As noted above, the JIT has achieved roughly 4-12% faster geometric mean on the pyperformance benchmark suite for measured Tier 1 platforms (see Appendix), with some limitations, challenges and areas of improvement. In this next phase of the JIT, we want to set an initial ambitious but attainable target of at least 20% performance improvement over the interpreter on the free-threaded build achieved within the next 2.5 years (in other words, by Python 3.17).
However, we know that performance for performance’s sake and at the cost of tooling incompatibility is not meaningful or attractive for the project. As such, we want to enter this next phase intentionally and with a clear plan, enumerated in detail in the specification section below.
Specification
In order to achieve a sustainable and maintainable 20%+ performance gain with full tooling compatibility in the next 2.5 years, there are several areas worth discussing:
- Key JIT infrastructure, including an evolution of the JIT frontend
- Optimizations
- First-class support for free-threading
- A better distribution story
- Compatibility
- Tooling support
Key JIT Infrastructure
Traditionally, compilers are split into a frontend, middle-end, and backend. They have the following meaning in our context:
- Frontend: Selects what to compile. This can be methods or traces of CPython specialized bytecode.
- Middle-end: Optimizes instructions. Translates specialized bytecode to uops and optimizes them.
- Backend: Generates machine code.
At present, the frontend uses trace recording. Elaborating more, trace recording records the actual flow of execution through the program’s bytecodes, along with live values during execution. We instrumented the interpreter to achieve this. This frontend was not the one originally introduced in 3.13, which seemed to be ineffective at the time due to various reasons [2].
To ease maintenance burden, disentangle the JIT and the interpreter, and unlock future optimizations in a sustainable fashion, we propose changing the frontend by 3.16 to a method one. The method frontend can be rolled back midway to the trace recording one if it does not meet our goals.
Changing the frontend is not free. Time spent on this work is time not spent directly adding optimizations to the current tracing frontend, and some trace-specific performance wins may need to be recovered after the transition. We believe this opportunity cost is justified only because the current frontend appears likely to impose increasing maintenance, teaching, debugging, and optimization costs as it matures that will outpace the initial implementation cost of the method frontend.
To elaborate on the difference, trace recording records straight-line sequences through the code, while methods generally select one or more Python functions to compile.
The middle-end and backend will not require major changes. Nearly all of the current code can be reused for the method frontend. The current backend which uses Copy and Patch compilation already supports branches and jumps in the control-flow. The middle-end which analyzes types over uops just needs to support merging type information at control-flow merge points.
Motivated by our learnings over the past several years, our goals for the method frontend are as follows:
- To make optimization as simple and as traditional as possible so as to avoid unnecessary experimentation on CPython’s main branch.
- To make the JIT easier to maintain. We don’t mean this in lines of code, but rather in conceptual burden. A single maintainer should be able to “fit” the entire system in their head and accurately predict/understand its behavior, even in reasonably complex programs. The current tracing frontend can produce head-scratching results even for very simple benchmarking programs.
- To enable higher-level optimizations more easily, without requiring a higher JIT tier (which requires another JIT bolted on top) or inter-trace knowledge.
The following is the reference design. It is subject to change as the code evolves:
- Uop IR. The benefits for this are explained in previous sections.
- Some Single Static Assignment (SSA) form properties over the stack. This does not mean we need to rewrite our IR to SSA form, but rather, the optimizer should have some SSA properties. We believe this aligns more closely with other compilers (e.g. Cinder, PyPy, Chrome’s V8, CRuby’s YJIT/ZJIT), and makes understanding how to optimize in the JIT easier and more powerful. The current JIT optimizer already nearly supports this, and only requires minimal changes to have SSA properties. An IR with proper stack discipline already has many useful properties that are analogous to SSA form. SSA form will basically come for free for stack variables.
- A simple way to represent high-level constructs. We have an implementation that forms regions (groups of basic blocks), inspired by the similarly named concept in MLIR (an LLVM project). Rather than degenerating programs to single basic blocks pointing to each other, we opt to keep the high-level construct information around. In MLIR, this was motivated by better loop analysis and optimization. In CPython’s JIT, this is motivated by better generator/coroutine/loop/etc. (high-level construct) analysis and optimizations.
With all of the above, most optimizations in the JIT can be implemented as local rewrites. This is again, inspired by certain properties of other runtime’s intermediate representations. Our goal is to make the JIT more traditional and teachable, without sacrificing what we can optimize. We do acknowledge that a method JIT requires joining control-flow. However, we believe this is not a large conceptual overhead, as a tracing JIT already requires teaching the concept of joining control-flow once anything other than the most basic optimizations are implemented.
In terms of what code we need to achieve this frontend, most of the infrastructure required is already present. The main code modifications required are the data structures to represent a control-flow graph, and worklist algorithms to drive the pre-existing optimizer/analysis pass. We can proceed to remove most of the current tracing frontend from the JIT from the interpreter, which will simplify the interpreter’s core dispatch mechanism and simplify the main interpreter loop. We believe these are not foreign concepts to CPython – the current bytecode compiler in CPython already represents control-flow graphs and has worklist algorithms.
Finally, both method and tracing JITs gain complexity and have various tradeoffs to achieve great performance. Where tracing has greater simplicity in value recording and profiling, methods need more advanced polymorphic inline caches. Where tracing needs inter-trace optimization to get higher-level optimizations, method JITs have it simpler by seeing more code. Both of these need tight coupling with the interpreter to achieve great performance. We understand both technologies come with tradeoffs, and we are once again not making a judgement of which is ultimately better. Our claim is just that for the optimizations that CPython requires, and for the ease of teaching, debugging and analyzing, and for finding solutions in similar language runtimes to our problems, a method-based JIT in this case is ultimately our choice. To be upfront, and provide an understanding of the potential additional complexity needed: a proposed method JIT may also require certain additional features (in literature) like recording extra type profiling data in an extra side table. However, the complexity can be greatly mitigated by the current bytecode DSL and automatically generating the profiling operations, similar to how the current tracing JIT already does things. We also propose solutions to mitigate them in Optimizations. We thus believe the conceptual and maintenance leap is not huge.
Optimizations
The method JIT builds on the pre-existing optimizations already present in the current trace recording JIT. Namely it will come with the following optimizations by virtue of the pre-existing JIT middle-end:
- Type speculation (via the specializing adaptive interpreter’s typed bytecode)
- Useless check/guard removal
- Redundant reference counting removal
- Constant folding
Knowledge of the current middle-end is transferrable and contributors who have worked on the current middle-end need not relearn much as the JIT middle-end can work with the method frontend with minimal changes.
Switching frontends has a short-term performance opportunity cost. The trace recording frontend has benefited from nearly a year of focused work, and some of its gains, especially those tied to trace-specific behavior or free-threading-unsafe optimizations, may need to be recovered after the transition. The reason to accept this cost is that a method frontend should make the next set of larger optimizations easier to implement, reason about, test, and maintain.
As part of our plans, we plan to optimize generators/coroutines better and improve the efficiency of calls. These high-level optimizations motivated us towards a method-based JIT. These optimizations require seeing more of the user’s code to be effective, and the current tracing JIT in CPython cannot achieve this without inter-trace optimization or trace stitching, which increases the complexity and coupling with the runtime.
Further optimizations are possible. However, they do not differ much if a trace recording or method frontend is used:
- Lock removal on free-threading
- Detecting deferred reclamation to reduce escaping sites in the JIT on free-threading
- Conservative unboxing of integers, floats, and small strings
To recover the optimizations tracing gives for free, we plan to explore:
- Recording extra type profiling information from the interpreter’s specializer
- Path splitting (duplicating the control-flow graph)
- Cold code elimination
- Respecializing instructions in the middle-end.
One may argue that this introduces a lot of complexity in the method JIT. However, type profiling involves no changes to the interpreter and only minimal changes to the specializer. Path splitting can be found in standard compiler textbooks. Cold code elimination is trivial to implement in current CPython due to branch information tracking already in the interpreter (we can just choose not to compile branches/blocks that are never taken). Respecialization can be done by leveraging the existing specializer’s decisions. This continues the trend that we feel method JITs are less entangled with the interpreter in the case of CPython.
First-Class Support for Free-Threading
free-threading is already a part of Python’s future, and the current JIT must be made free-threading safe as soon as possible to be a viable option for improved performance. This involves making the frontend and middle-end’s optimizations free-threading safe (the backend should already be safe). We do not anticipate that a method frontend will make free-threading support more difficult over a tracing one. Furthermore, all major optimizations for the pre-existing JIT implemented in the past year have already been designed with free-threading in mind. However, a slight performance penalty may initially be encountered as we remove free-threading unsafe optimizations. For example, we anticipate that the major optimizations that need addressing will be globals/builtins dictionary and type watchers. Resolving attribute/global lookup at JIT compile time should still be feasible, but removing their guards altogether may be unsafe in free-threading. We expect a naive fix to produce a slight (1-2% geomean) performance hit initially.
All future optimizations upon resuming JIT development will be reviewed with free-threading compatibility and performance impact required before merge. Optimizations that rely solely on the GIL build and break on the free threaded build will be rejected.
Additionally, the JIT may eventually even produce better performance versus the free-threading build than the current GIL build. Early experiments in the JIT suggest free threaded optimization may gain a few more percentage points on pyperformance. For example:
- Reference counting on the free-threading build is more expensive than on the GIL build, and the JIT can eliminate much of reference counting.
- The JIT has more leeway with the lifetimes of certain objects, due to deferred reference counting (see PEP 703) and Quiescent State-Based Reclamation (QSBR). This unlocks even more optimization opportunities that are not possible with immediate reclamation.
- The JIT can remove locks and atomics in the specializing adaptive interpreter when it detects that objects are uniquely referenced. This is a source of slowdown on architectures where atomics are more expensive.
We believe that the right framing here is not the JIT *or* free-threading, but rather, the JIT *and* free-threading. We understand the JIT may initially lose some performance opportunities from free-threading’s semantics. However, both the JIT and free-threading have much to gain. The JIT can recover all of free-threading’s single-threaded performance losses and maybe even more.
A Better JIT Distribution Story
We can choose to adopt either PEP 774’s solution or allow a range of LLVM versions to build the JIT. This is up for more discussion and experimentation. At minimum, feedback from popular Linux distributors must be collected, deliberated on, and incorporated into a holistic solution. Our current understanding of the situation is that supporting multiple LLVM versions is required. This should not add much additional complexity. However, it may require more CI resources for testing. For the JIT to be successful, it must not unduly burden third-party distributors.
Compatibility Review
As part of our roadmap, we plan to run the test suites of the top PyPI packages and detect if the JIT breaks them, similar to the initial nogil repository’s approach where Sam Gross ran popular PyPI packages to detect bugs and incompatibilities (see the labeille project [5] for additional prior art).
Failing a package’s test suite does not mean the goal is automatically not met. Certain test suites may rely on CPython internal details that are not guaranteed. Therefore, this requires a case-by-case examination. The bottom line is that we must have at least made a concerted effort to assess JIT compatibility with existing Python code out there, and made a best-effort attempt at correcting any “real” bugs.
Tooling Support
The JIT will not regress on the current native unwinding support. To recap, the JIT currently supports all frame pointer-based unwinders and eh_frame-based ones as well (such as GNU backtrace).
The JIT will continue supporting out-of-process profilers/debuggers that require Python frames. We understand that frame elision (inlining) is a promising optimization. However, completely eliding frames in the JIT would break third party tools. We will take care to negotiate and provide alternative methods for Python frame unwinders the required information to recover the elided frame, such as storing metadata for the elided frame. Furthermore, tools that inspect the Python stack may need to symbolize the JIT C shim frame (i.e., relate it to a Python function call). In this case, all necessary information to support these tools will be provided in the CPython runtime, either through executor objects or elsewhere, and also in the debug offsets for these tools to support making sense of a callstack with JIT frames. For this, we may consult with maintainers of popular Python frame unwinding applications. As a general rule: if something works with the JIT off, we should do everything we can to make sure it also works (or has usable alternatives) with the JIT on, and not break genuinely useful observability and debugging features in the name of raw performance.
The JIT will continue supporting in-process tools. This means it will not break
sys._getframe, pdb or sys.monitoring.
Relationship to Other JITs and Compiler Tools
CPython’s JIT is not intended to replace third-party specialist JITs or compiler projects, such as CinderX, Numba, PyTorch Compile or other domain-specific compilers. Those projects often optimize different workloads, use different assumptions or operate at different layers of the stack. The JIT is intended to be a “backstop” for the execution of any code that ends up being the responsibility of the interpreter itself, just as it is today.
Platform Support
The JIT will support all Tier 1 platforms, as specified in PEP 11, at time of writing:
| Target Triple | Notes |
|---|---|
| aarch64-apple-darwin | clang |
| aarch64-unknown-linux-gnu | glibc, gcc |
| i686-pc-windows-msvc | |
| x86_64-pc-windows-msvc | |
| x86_64-unknown-linux-gnu | glibc, gcc |
However, we do not plan to concentrate dedicated cycles to improving 32-bit Windows performance and would like to exclude the platform from our goals as PyPI statistics suggest 32-bit Windows builds are a vanishingly small number of downloads. Furthermore, if other conventionally non-JIT platforms eventually get promoted to Tier 1 (such as WASI), we do not expect to support those either.
Thanks to the Copy and Patch backend, the JIT supports the platforms of interest with minimal additional work required from us. The key idea is to not handwrite machine code equivalents of our IR, as that causes too much churn and is unsustainable with CPython’s rapid bytecode changes.
Maintenance Model
The JIT is maintained by a group of CPython core developers and contributors working across its three stages: the frontend, the optimizer and the code-generation backend. A central goal of the JIT has been to keep more than one active maintainer familiar with each stage, so that no part of the JIT depends on a single person. The contributor base has grown deliberately rather than by chance. During the 3.15 cycle, optimization work was decomposed into small, individually actionable tasks, which lowered the barrier to contributing and drew roughly a dozen people into the trace-recording conversion effort while increasing the number of recurring optimizer contributors [2]. This task decomposition is an ongoing mechanism for bringing in and retaining contributors, and it is how the project intends to sustain and widen its maintainer pool over time.
At present, the project does not depend on any single sponsor. It has continued as a community-led effort after its initial principal corporate sponsor wound down its dedicated funding, and it currently combines volunteer work with some ongoing corporate contributions, primarily from Arm, FastAPI Labs, and OpenAI. Sustaining the JIT also depends on shared and key infrastructure: the continuous integration and build configurations that exercise JIT builds (currently part of regular CI on main [3]), and the self-hosted benchmarking machines and infrastructure that publish nightly results [4] (currently maintained by Savannah; machines contributed by Savannah and Arm).
Finally, and perhaps most importantly, the JIT must remain accessible for contributors who do not work on it. This means committing to keeping the interpreter approachable or decoupled from the JIT, to documenting the workflow for regenerating generated code and contributing changes, and to keeping the internals documentation current. The simplification of the optimizer’s operations shipped in 3.15 (see Learnings) is an example of this maintenance investment in practice. Obligations on redistributors who build and ship the JIT are described in the A Better JIT Distribution Story section.
Backwards Compatibility
Since the JIT is an optimization and not a change to the language, its central compatibility guarantee is that a JIT-enabled build must produce behavior indistinguishable from a non-JIT build, just faster: same results, same exceptions and tracebacks, and same supported introspectable state.
As covered above, we will conduct a compatibility analysis on the top PyPI packages’ test suite as a requirement to regard the JIT as supported in CPython.
Security Implications
As stated in PEP 744, CPython’s JIT, like all JITs, produces large amounts of executable data at runtime. This is an attack vector of all JIT compilers: a malicious actor capable of influencing the contents of this data is therefore capable of executing arbitrary code.
In order to mitigate this risk, the JIT has been written with best practices in mind. In particular, the data in question is not exposed by the JIT compiler to other parts of the program while it remains writable, and at no point is the data both writable and executable.
The nature of template-based JITs also seriously limits the kinds of code that can be generated, further reducing the likelihood of a successful exploit. As an additional precaution, the templates themselves are stored in static, read-only memory.
However, it would be naive to assume that no possible vulnerabilities exist in the JIT, especially at this early stage. The authors are not security experts, but will work closely with the Python Security Response Team to triage and fix security issues as they arise.
Supporting CET/BTI has also been requested by Fedora maintainers [7]. We believe supporting this option in the generated stencils is required for meeting our goals for security.
Finally, since PEP 744’s inception, multiple fuzzing projects have been initiated to fuzz the JIT. For example, Lafleur [6] has found numerous JIT bugs that lead to crashes or wrong optimizations (mostly in the JIT middle-end, not the backend). We will continue using these projects to fuzz the JIT.
How to Teach This
For the vast majority of Python users, the most important thing to teach about the JIT is that there is nothing they need to do and nothing they need to watch out for. No code should need to be rewritten to benefit from the JIT, and none should need to be changed to remain correct.
For users who want a mental model, a short and accurate one is enough: the JIT is an optimization layer that sits above the interpreter and compiles frequently-executed code to machine code on the fly. It changes how fast a program runs, not what it does. This framing is sufficient for most educational contexts and does not require teaching the internals (for example, uops, optimizer, code generation).
Two audiences need more specific guidance: redistributors and packagers. These users will need to understand the build-time requirements and the path toward distributable artifacts (see “A Better JIT Distribution Story”). Maintainers of debuggers, profilers, and other native tooling need to know that JIT frames are unwindable on supported platforms and what they can rely on when inspecting a running process. Python stack unwinders will need to understand the JIT frame layout and recover information during symbolization (see “Tooling Support”).
Finally, core developers only need to care about the JIT if they want their
feature to be optimized by it. Otherwise, the current JIT architecture means
that core developers working on the interpreter or other parts of the runtime
do not need to care that a JIT exists, apart from the occasional CI breakage.
Once the JIT is regarded as supported, it should not be broken catastrophically
by any new changes. However, we expect that in almost all cases, introducing a
new feature to Python will not be obstructed by a JIT, unless the contributor
explicitly wants the JIT to support their feature or optimize for it. Once
again, see for example the lazy imports initial implementation which modified
bytecode, but did not need to modify the JIT other than #include the new
headers introduced [9].
Reference Implementation
The current implementation for the JIT can be found in CPython’s main branch, largely in:
Tools/jit/README.md: Instructions for how to build the JIT.Python/jit.c: The entire backend portion of the JIT compiler.Python/optimizer.c: Part of the frontend of the JIT compiler (partially shared fromPython/ceval.c).Python/optimizer_analysis.c: The middle-end of the JIT compiler.Python/optimizer_bytecodes.c: The middle-end of the JIT compiler’s optimization rules.jit_stencils.h: An example of the JIT’s build-time generated templates (not currently checked into the CPython repository).Tools/jit/template.c: The code which is compiled to produce the JIT’s templates.Tools/jit/_targets.py: The code to compile and parse the templates at build time.
While this PEP does propose and outline an evolution for the JIT (transitioning from tracing to method-based, with heavy reuse of existing code), it does not prescribe a particular implementation of that design. With that said, a working proof-of-concept implementation against main exists, and will be shared soon.
Despite the fact that it is currently under development and incomplete (it does not yet handle generators and coroutines, for example, and has no support for polymorphism, both of which are supported partially by the existing tracing frontend), it is still 4-5% faster on the pyperformance and Pyston macrobenchmark suites vs. JIT off, on a GIL-enabled build. This demonstrates that the new design developed in just a couple of months can be competitive with the existing tracing design (which is 7-8% faster on the same x86-64 Linux configuration after 3 years of work evolving it).
Excluding tests, the size of the current method-JIT implementation vs. main is approximately as follows:
| File Type | Files Changed | Lines Added | Lines Removed |
|---|---|---|---|
| Generated | 13 | 5300 | 4200 |
| Non-Generated | 38 | 2700 | 5800 |
| Total | 51 | 8000 | 10000 |
Broken down by file extension:
| Extension | Files Changed | Lines Added | Lines Removed |
|---|---|---|---|
| .c | 20 | 2300 | 5400 |
| .c.h | 6 | 3300 | 2400 |
| .h | 20 | 2300 | 2100 |
| .py | 5 | 100 | 100 |
| Total | 51 | 8000 | 10000 |
Rejected Ideas
Maintain the JIT Outside of CPython main
It has been suggested, both during the JIT’s history and in recent discussion, that a compiler of this complexity might be better developed and maintained out of tree or as a separate project, rather than in CPython’s main branch. However, keeping the JIT in main is a deliberate and hugely beneficial choice, originally articulated in PEP 744: it allows the JIT to be co-developed with the interpreter and maintained by the broader group of core developers and contributors rather than a small set of specialists working on a fork. The uops the JIT consumes are also co-designed with and regenerated from the interpreter. An out-of-tree JIT would have to track those definitions across a branch boundary, which raises the maintenance cost and the risk of drift precisely in the area where correctness matters most. The growth of the contributor base during the 3.15 cycle (see Maintenance Model) is itself evidence that in-tree development lowers, rather than raises, the barrier to participation. Keeping the JIT in the main branch of CPython also allows us to have a better pulse on the needs of distributors, and means that it’s easier for end users to try out the JIT and let us know what behavior they observe and what issues they find.
Pluggable JIT Infrastructure
Another recurring idea is for CPython to expose a stable, general-purpose interface for plugging in arbitrary third-party JIT compilers, rather than maintaining one in tree. This PEP rejects this idea for the same reasons as maintaining the JIT outside of CPython main. Introducing a pluggable JIT risks diverting contributor effort and increases maintenance overhead. For example, an earlier version of the JIT in 3.13 had a semi-public experimental API. However, it leaked internal details to “users” (there were none) and made internal JIT development more difficult. Thus, we removed it. Furthermore, most language runtime JITs are deeply integrated with their respective runtimes to the extent that a pluggable JIT infrastructure may not be feasible.
We agree however, that efforts that maintain a JIT outside of CPython using PEP 523, such as CinderX and TorchDynamo, are commendable. We believe the discussions to be had for improving the pre-existing interfaces are best left to a separate PEP, and consider them out of scope for this one.
Dropping the Build-Time LLVM Requirement
This PEP does not propose changing the JIT’s reliance on the LLVM toolchain at build-time. We treat reducing build-time friction as important, but not as a precondition for agreeing on the path outlined here. PEP 774 proposes a solution for removing the LLVM prerequisite but at time of submission, the sitting Steering Council decided to defer making a decision on it until the JIT had achieved more substantial performance gains. We would also like to keep exploring options in this space and as such, would like to save this for a separate PEP.
A Higher-Tier JIT
We believe that multi-tiered JITs produce great performance and compelling warmup times. However, we also believe that for the time being, CPython’s complexity and maintenance budget may not support such an endeavour. We are not saying this should never happen. Rather, our goal is to produce the best JIT we can for the current state of CPython, given the constraints we can work with. For that, we reject building yet another JIT on top of the current one for peak performance.
Enable/Support the Current JIT As-Is
The current JIT is undoubtedly the product of much attention and care – we thank everyone who contributed to it. However, we understand the community as a whole have concerns that are still unaddressed and therefore need remedying. We also acknowledge that Python, and indeed CPython, is so widely-used that a change of this scale must be properly examined and considered before it can be a part of the project proper. As such, the current JIT cannot be enabled without more scrutiny and evolution.
Open Issues
None at this time.
Appendix
Average JIT Speedup by Machine (calculated from 2026-06-16 to 2026-06-27)
| Machine | Config | Avg speedup | Result | Days | Range |
|---|---|---|---|---|---|
| jones (M3 Pro, macOS) | JIT+TAILCALL | 1.126x | 12.6% faster | 9 | 1.050-1.180 |
| sulaco (AmpereOne, Linux aarch64) | JIT | 1.073x | 7.3% faster | 7 | 1.060-1.080 |
| ripley (i5-8400, Linux x86_64) | JIT | 1.069x | 6.9% faster | 9 | 1.060-1.070 |
| prometheus (Ryzen 5 3600X, Windows) | JIT+TAILCALL | 1.047x | 4.7% faster | 9 | 1.040-1.050 |
Note
Note that JIT+TAILCALL is used on Windows and macOS runs, as regular CPython builds ship with tailcalling enabled. All data used for this calculation can be found on Does JIT Go Brrr? [4].
Footnotes
Change History
None at this time.
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.