PEP: 3146 Title: Merging Unladen Swallow into CPython Version:
$Revision$ Last-Modified: $Date$ Author: Collin Winter
<collinwinter@google.com>, Jeffrey Yasskin <jyasskin@google.com>, Reid
Kleckner <rnk@mit.edu> Status: Withdrawn Type: Standards Track
Content-Type: text/x-rst Created: 01-Jan-2010 Python-Version: 3.3
Post-History:

PEP Withdrawal

With Unladen Swallow going the way of the Norwegian Blue[1] [2], this
PEP has been deemed to have been withdrawn.

Abstract

This PEP proposes the merger of the Unladen Swallow project[3] into
CPython's source tree. Unladen Swallow is an open-source branch of
CPython focused on performance. Unladen Swallow is source-compatible
with valid Python 2.6.4 applications and C extension modules.

Unladen Swallow adds a just-in-time (JIT) compiler to CPython, allowing
for the compilation of selected Python code to optimized machine code.
Beyond classical static compiler optimizations, Unladen Swallow's JIT
compiler takes advantage of data collected at runtime to make checked
assumptions about code behaviour, allowing the production of faster
machine code.

This PEP proposes to integrate Unladen Swallow into CPython's
development tree in a separate py3k-jit branch, targeted for eventual
merger with the main py3k branch. While Unladen Swallow is by no means
finished or perfect, we feel that Unladen Swallow has reached sufficient
maturity to warrant incorporation into CPython's roadmap. We have sought
to create a stable platform that the wider CPython development team can
build upon, a platform that will yield increasing performance for years
to come.

This PEP will detail Unladen Swallow's implementation and how it differs
from CPython 2.6.4; the benchmarks used to measure performance; the
tools used to ensure correctness and compatibility; the impact on
CPython's current platform support; and the impact on the CPython core
development process. The PEP concludes with a proposed merger plan and
brief notes on possible directions for future work.

We seek the following from the BDFL:

-   Approval for the overall concept of adding a just-in-time compiler
    to CPython, following the design laid out below.
-   Permission to continue working on the just-in-time compiler in the
    CPython source tree.
-   Permission to eventually merge the just-in-time compiler into the
    py3k branch once all blocking issues[4] have been addressed.
-   A pony.

Rationale, Implementation

Many companies and individuals would like Python to be faster, to enable
its use in more projects. Google is one such company.

Unladen Swallow is a Google-sponsored branch of CPython, initiated to
improve the performance of Google's numerous Python libraries, tools and
applications. To make the adoption of Unladen Swallow as easy as
possible, the project initially aimed at four goals:

-   A performance improvement of 5x over the baseline of CPython 2.6.4
    for single-threaded code.
-   100% source compatibility with valid CPython 2.6 applications.
-   100% source compatibility with valid CPython 2.6 C extension
    modules.
-   Design for eventual merger back into CPython.

We chose 2.6.4 as our baseline because Google uses CPython 2.4
internally, and jumping directly from CPython 2.4 to CPython 3.x was
considered infeasible.

To achieve the desired performance, Unladen Swallow has implemented a
just-in-time (JIT) compiler[5] in the tradition of Urs Hoelzle's work on
Self[6], gathering feedback at runtime and using that to inform
compile-time optimizations. This is similar to the approach taken by the
current breed of JavaScript engines[7],[8]; most Java virtual
machines[9]; Rubinius[10], MacRuby[11], and other Ruby implementations;
Psyco[12]; and others.

We explicitly reject any suggestion that our ideas are original. We have
sought to reuse the published work of other researchers wherever
possible. If we have done any original work, it is by accident. We have
tried, as much as possible, to take good ideas from all corners of the
academic and industrial community. A partial list of the research papers
that have informed Unladen Swallow is available on the Unladen Swallow
wiki[13].

The key observation about optimizing dynamic languages is that they are
only dynamic in theory; in practice, each individual function or snippet
of code is relatively static, using a stable set of types and child
functions. The current CPython bytecode interpreter assumes the worst
about the code it is running, that at any moment the user might override
the len() function or pass a never-before-seen type into a function. In
practice this never happens, but user code pays for that support.
Unladen Swallow takes advantage of the relatively static nature of user
code to improve performance.

At a high level, the Unladen Swallow JIT compiler works by translating a
function's CPython bytecode to platform-specific machine code, using
data collected at runtime, as well as classical compiler optimizations,
to improve the quality of the generated machine code. Because we only
want to spend resources compiling Python code that will actually benefit
the runtime of the program, an online heuristic is used to assess how
hot a given function is. Once the hotness value for a function crosses a
given threshold, it is selected for compilation and optimization. Until
a function is judged hot, however, it runs in the standard CPython eval
loop, which in Unladen Swallow has been instrumented to record
interesting data about each bytecode executed. This runtime data is used
to reduce the flexibility of the generated machine code, allowing us to
optimize for the common case. For example, we collect data on

-   Whether a branch was taken/not taken. If a branch is never taken, we
    will not compile it to machine code.
-   Types used by operators. If we find that a + b is only ever adding
    integers, the generated machine code for that snippet will not
    support adding floats.
-   Functions called at each callsite. If we find that a particular
    foo() callsite is always calling the same foo function, we can
    optimize the call or inline it away

Refer to[14] for a complete list of data points gathered and how they
are used.

However, if by chance the historically-untaken branch is now taken, or
some integer-optimized a + b snippet receives two strings, we must
support this. We cannot change Python semantics. Each of these sections
of optimized machine code is preceded by a guard, which checks whether
the simplifying assumptions we made when optimizing still hold. If the
assumptions are still valid, we run the optimized machine code; if they
are not, we revert back to the interpreter and pick up where we left
off.

We have chosen to reuse a set of existing compiler libraries called LLVM
[15] for code generation and code optimization. This has saved our small
team from needing to understand and debug code generation on multiple
machine instruction sets and from needing to implement a large set of
classical compiler optimizations. The project would not have been
possible without such code reuse. We have found LLVM easy to modify and
its community receptive to our suggestions and modifications.

In somewhat more depth, Unladen Swallow's JIT works by compiling CPython
bytecode to LLVM's own intermediate representation (IR)[16], taking into
account any runtime data from the CPython eval loop. We then run a set
of LLVM's built-in optimization passes, producing a smaller, optimized
version of the original LLVM IR. LLVM then lowers the IR to
platform-specific machine code, performing register allocation,
instruction scheduling, and any necessary relocations. This arrangement
of the compilation pipeline allows the LLVM-based JIT to be easily
omitted from a compiled python binary by passing --without-llvm to
./configure; various use cases for this flag are discussed later.

For a complete detailing of how Unladen Swallow works, consult the
Unladen Swallow documentation[17],[18].

Unladen Swallow has focused on improving the performance of
single-threaded, pure-Python code. We have not made an effort to remove
CPython's global interpreter lock (GIL); we feel this is separate from
our work, and due to its sensitivity, is best done in a mainline
development branch. We considered making GIL-removal a part of Unladen
Swallow, but were concerned by the possibility of introducing subtle
bugs when porting our work from CPython 2.6 to 3.x.

A JIT compiler is an extremely versatile tool, and we have by no means
exhausted its full potential. We have tried to create a sufficiently
flexible framework that the wider CPython development community can
build upon it for years to come, extracting increased performance in
each subsequent release.

Alternatives

There are number of alternative strategies for improving Python
performance which we considered, but found unsatisfactory.

-   Cython, Shedskin: Cython[19] and Shedskin[20] are both static
    compilers for Python. We view these as useful-but-limited
    workarounds for CPython's historically-poor performance. Shedskin
    does not support the full Python standard library[21], while Cython
    requires manual Cython-specific annotations for optimum performance.

    Static compilers like these are useful for writing extension modules
    without worrying about reference counting, but because they are
    static, ahead-of-time compilers, they cannot optimize the full range
    of code under consideration by a just-in-time compiler informed by
    runtime data.

-   IronPython: IronPython[22] is Python on Microsoft's .Net platform.
    It is not actively tested on Mono[23], meaning that it is
    essentially Windows-only, making it unsuitable as a general CPython
    replacement.

-   Jython: Jython[24] is a complete implementation of Python 2.5, but
    is significantly slower than Unladen Swallow (3-5x on measured
    benchmarks) and has no support for CPython extension modules[25],
    which would make migration of large applications prohibitively
    expensive.

-   Psyco: Psyco[26] is a specializing JIT compiler for CPython,
    implemented as an extension module. It primarily improves
    performance for numerical code. Pros: exists; makes some code
    faster. Cons: 32-bit only, with no plans for 64-bit support;
    supports x86 only; very difficult to maintain; incompatible with
    SSE2 optimized code due to alignment issues.

-   PyPy: PyPy[27] has good performance on numerical code, but is slower
    than Unladen Swallow on some workloads. Migration of large
    applications from CPython to PyPy would be prohibitively expensive:
    PyPy's JIT compiler supports only 32-bit x86 code generation;
    important modules, such as MySQLdb and pycrypto, do not build
    against PyPy; PyPy does not offer an embedding API, much less the
    same API as CPython.

-   PyV8: PyV8[28] is an alpha-stage experimental Python-to-JavaScript
    compiler that runs on top of V8. PyV8 does not implement the whole
    Python language, and has no support for CPython extension modules.

-   WPython: WPython[29] is a wordcode-based reimplementation of
    CPython's interpreter loop. While it provides a modest improvement
    to interpreter performance[30], it is not an either-or substitute
    for a just-in-time compiler. An interpreter will never be as fast as
    optimized machine code. We view WPython and similar interpreter
    enhancements as complementary to our work, rather than as
    competitors.

Performance

Benchmarks

Unladen Swallow has developed a fairly large suite of benchmarks,
ranging from synthetic microbenchmarks designed to test a single feature
up through whole-application macrobenchmarks. The inspiration for these
benchmarks has come variously from third-party contributors (in the case
of the html5lib benchmark), Google's own internal workloads
(slowspitfire, pickle, unpickle), as well as tools and libraries in
heavy use throughout the wider Python community (django, 2to3,
spambayes). These benchmarks are run through a single interface called
perf.py that takes care of collecting memory usage information, graphing
performance, and running statistics on the benchmark results to ensure
significance.

The full list of available benchmarks is available on the Unladen
Swallow wiki [31], including instructions on downloading and running the
benchmarks for yourself. All our benchmarks are open-source; none are
Google-proprietary. We believe this collection of benchmarks serves as a
useful tool to benchmark any complete Python implementation, and indeed,
PyPy is already using these benchmarks for their own performance testing
[32],[33]. We welcome this, and we seek additional workloads for the
benchmark suite from the Python community.

We have focused our efforts on collecting macrobenchmarks and benchmarks
that simulate real applications as well as possible, when running a
whole application is not feasible. Along a different axis, our benchmark
collection originally focused on the kinds of workloads seen by Google's
Python code (webapps, text processing), though we have since expanded
the collection to include workloads Google cares nothing about. We have
so far shied away from heavily numerical workloads, since NumPy[34]
already does an excellent job on such code and so improving numerical
performance was not an initial high priority for the team; we have begun
to incorporate such benchmarks into the collection [35] and have started
work on optimizing numerical Python code.

Beyond these benchmarks, there are also a variety of workloads we are
explicitly not interested in benchmarking. Unladen Swallow is focused on
improving the performance of pure-Python code, so the performance of
extension modules like NumPy is uninteresting since NumPy's core
routines are implemented in C. Similarly, workloads that involve a lot
of IO like GUIs, databases or socket-heavy applications would, we feel,
fail to accurately measure interpreter or code generation optimizations.
That said, there's certainly room to improve the performance of
C-language extensions modules in the standard library, and as such, we
have added benchmarks for the cPickle and re modules.

Performance vs CPython 2.6.4

The charts below compare the arithmetic mean of multiple benchmark
iterations for CPython 2.6.4 and Unladen Swallow. perf.py gathers more
data than this, and indeed, arithmetic mean is not the whole story; we
reproduce only the mean for the sake of conciseness. We include the t
score from the Student's two-tailed T-test[36] at the 95% confidence
interval to indicate the significance of the result. Most benchmarks are
run for 100 iterations, though some longer-running whole-application
benchmarks are run for fewer iterations.

A description of each of these benchmarks is available on the Unladen
Swallow wiki[37].

Command: :

    ./perf.py -r -b default,apps ../a/python ../b/python

32-bit; gcc 4.0.3; Ubuntu Dapper; Intel Core2 Duo 6600 @ 2.4GHz; 2
cores; 4MB L2 cache; 4GB RAM

  ---------------------------------------------------------------------------------------------
  Benchmark      CPython   Unladen       Change    Significance    Timeline
                 2.6.4     Swallow r988                            
  -------------- --------- ------------- --------- --------------- ----------------------------
  2to3           25.13 s   24.87 s       1.01x     t=8.94          http://tinyurl.com/yamhrpg
                                         faster                    

  django         1.08 s    0.80 s        1.35x     t=315.59        http://tinyurl.com/y9mrn8s
                                         faster                    

  html5lib       14.29 s   13.20 s       1.08x     t=2.17          http://tinyurl.com/y8tyslu
                                         faster                    

  nbody          0.51 s    0.28 s        1.84x     t=78.007        http://tinyurl.com/y989qhg
                                         faster                    

  rietveld       0.75 s    0.55 s        1.37x     Insignificant   http://tinyurl.com/ye7mqd3
                                         faster                    

  slowpickle     0.75 s    0.55 s        1.37x     t=20.78         http://tinyurl.com/ybrsfnd
                                         faster                    

  slowspitfire   0.83 s    0.61 s        1.36x     t=2124.66       http://tinyurl.com/yfknhaw
                                         faster                    

  slowunpickle   0.33 s    0.26 s        1.26x     t=15.12         http://tinyurl.com/yzlakoo
                                         faster                    

  spambayes      0.31 s    0.34 s        1.10x     Insignificant   http://tinyurl.com/yem62ub
                                         slower                    
  ---------------------------------------------------------------------------------------------

64-bit; gcc 4.2.4; Ubuntu Hardy; AMD Opteron 8214 HE @ 2.2 GHz; 4 cores;
1MB L2 cache; 8GB RAM

  ---------------------------------------------------------------------------------------------
  Benchmark      CPython   Unladen       Change    Significance    Timeline
                 2.6.4     Swallow r988                            
  -------------- --------- ------------- --------- --------------- ----------------------------
  2to3           31.98 s   30.41 s       1.05x     t=8.35          http://tinyurl.com/ybcrl3b
                                         faster                    

  django         1.22 s    0.94 s        1.30x     t=106.68        http://tinyurl.com/ybwqll6
                                         faster                    

  html5lib       18.97 s   17.79 s       1.06x     t=2.78          http://tinyurl.com/yzlyqvk
                                         faster                    

  nbody          0.77 s    0.27 s        2.86x     t=133.49        http://tinyurl.com/yeyqhbg
                                         faster                    

  rietveld       0.74 s    0.80 s        1.08x     t=-2.45         http://tinyurl.com/yzjc6ff
                                         slower                    

  slowpickle     0.91 s    0.62 s        1.48x     t=28.04         http://tinyurl.com/yf7en6k
                                         faster                    

  slowspitfire   1.01 s    0.72 s        1.40x     t=98.70         http://tinyurl.com/yc8pe2o
                                         faster                    

  slowunpickle   0.51 s    0.34 s        1.51x     t=32.65         http://tinyurl.com/yjufu4j
                                         faster                    

  spambayes      0.43 s    0.45 s        1.06x     Insignificant   http://tinyurl.com/yztbjfp
                                         slower                    
  ---------------------------------------------------------------------------------------------

Many of these benchmarks take a hit under Unladen Swallow because the
current version blocks execution to compile Python functions down to
machine code. This leads to the behaviour seen in the timeline graphs
for the html5lib and rietveld benchmarks, for example, and slows down
the overall performance of 2to3. We have an active development branch to
fix this problem ([38],[39]), but working within the strictures of
CPython's current threading system has complicated the process and
required far more care and time than originally anticipated. We view
this issue as critical to final merger into the py3k branch.

We have obviously not met our initial goal of a 5x performance
improvement. A performance retrospective follows, which addresses why we
failed to meet our initial performance goal. We maintain a list of
yet-to-be-implemented performance work[40].

Memory Usage

The following table shows maximum memory usage (in kilobytes) for each
of Unladen Swallow's default benchmarks for both CPython 2.6.4 and
Unladen Swallow r988, as well as a timeline of memory usage across the
lifetime of the benchmark. We include tables for both 32- and 64-bit
binaries. Memory usage was measured on Linux 2.6 systems by summing the
Private_ sections from the kernel's /proc/$pid/smaps pseudo-files[41].

Command:

    ./perf.py -r --track_memory -b default,apps ../a/python ../b/python

32-bit

  ----------------------------------------------------------------------------------
  Benchmark      CPython     Unladen Swallow   Change   Timeline
                 2.6.4       r988                       
  -------------- ----------- ----------------- -------- ----------------------------
  2to3           26396 kb    46896 kb          1.77x    http://tinyurl.com/yhr2h4z

  django         10028 kb    27740 kb          2.76x    http://tinyurl.com/yhan8vs

  html5lib       150028 kb   173924 kb         1.15x    http://tinyurl.com/ybt44en

  nbody          3020 kb     16036 kb          5.31x    http://tinyurl.com/ya8hltw

  rietveld       15008 kb    46400 kb          3.09x    http://tinyurl.com/yhd5dra

  slowpickle     4608 kb     16656 kb          3.61x    http://tinyurl.com/ybukyvo

  slowspitfire   85776 kb    97620 kb          1.13x    http://tinyurl.com/y9vj35z

  slowunpickle   3448 kb     13744 kb          3.98x    http://tinyurl.com/yexh4d5

  spambayes      7352 kb     46480 kb          6.32x    http://tinyurl.com/yem62ub
  ----------------------------------------------------------------------------------

64-bit

  ----------------------------------------------------------------------------------
  Benchmark      CPython     Unladen Swallow   Change   Timeline
                 2.6.4       r988                       
  -------------- ----------- ----------------- -------- ----------------------------
  2to3           51596 kb    82340 kb          1.59x    http://tinyurl.com/yljg6rs

  django         16020 kb    38908 kb          2.43x    http://tinyurl.com/ylqsebh

  html5lib       259232 kb   324968 kb         1.25x    http://tinyurl.com/yha6oee

  nbody          4296 kb     23012 kb          5.35x    http://tinyurl.com/yztozza

  rietveld       24140 kb    73960 kb          3.06x    http://tinyurl.com/ybg2nq7

  slowpickle     4928 kb     23300 kb          4.73x    http://tinyurl.com/yk5tpbr

  slowspitfire   133276 kb   148676 kb         1.11x    http://tinyurl.com/y8bz2xe

  slowunpickle   4896 kb     16948 kb          3.46x    http://tinyurl.com/ygywwoc

  spambayes      10728 kb    84992 kb          7.92x    http://tinyurl.com/yhjban5
  ----------------------------------------------------------------------------------

The increased memory usage comes from a) LLVM code generation, analysis
and optimization libraries; b) native code; c) memory usage issues or
leaks in LLVM; d) data structures needed to optimize and generate
machine code; e) as-yet uncategorized other sources.

While we have made significant progress in reducing memory usage since
the initial naive JIT implementation[42], there is obviously more to do.
We believe that there are still memory savings to be made without
sacrificing performance. We have tended to focus on raw performance, and
we have not yet made a concerted push to reduce memory usage. We view
reducing memory usage as a blocking issue for final merger into the py3k
branch. We seek guidance from the community on an acceptable level of
increased memory usage.

Start-up Time

Statically linking LLVM's code generation, analysis and optimization
libraries increases the time needed to start the Python binary. C++
static initializers used by LLVM also increase start-up time, as does
importing the collection of pre-compiled C runtime routines we want to
inline to Python code.

Results from Unladen Swallow's startup benchmarks:

    $ ./perf.py -r -b startup /tmp/cpy-26/bin/python /tmp/unladen/bin/python

    ### normal_startup ###
    Min: 0.219186 -> 0.352075: 1.6063x slower
    Avg: 0.227228 -> 0.364384: 1.6036x slower
    Significant (t=-51.879098, a=0.95)
    Stddev: 0.00762 -> 0.02532: 3.3227x larger
    Timeline: http://tinyurl.com/yfe8z3r

    ### startup_nosite ###
    Min: 0.105949 -> 0.264912: 2.5004x slower
    Avg: 0.107574 -> 0.267505: 2.4867x slower
    Significant (t=-703.557403, a=0.95)
    Stddev: 0.00214 -> 0.00240: 1.1209x larger
    Timeline: http://tinyurl.com/yajn8fa

    ### bzr_startup ###
    Min: 0.067990 -> 0.097985: 1.4412x slower
    Avg: 0.084322 -> 0.111348: 1.3205x slower
    Significant (t=-37.432534, a=0.95)
    Stddev: 0.00793 -> 0.00643: 1.2330x smaller
    Timeline: http://tinyurl.com/ybdm537

    ### hg_startup ###
    Min: 0.016997 -> 0.024997: 1.4707x slower
    Avg: 0.026990 -> 0.036772: 1.3625x slower
    Significant (t=-53.104502, a=0.95)
    Stddev: 0.00406 -> 0.00417: 1.0273x larger
    Timeline: http://tinyurl.com/ycout8m

bzr_startup and hg_startup measure how long it takes Bazaar and
Mercurial, respectively, to display their help screens. startup_nosite
runs python -S many times; usage of the -S option is rare, but we feel
this gives a good indication of where increased startup time is coming
from.

Unladen Swallow has made headway toward optimizing startup time, but
there is still more work to do and further optimizations to implement.
Improving start-up time is a high-priority item[43] in Unladen Swallow's
merger punchlist.

Binary Size

Statically linking LLVM's code generation, analysis and optimization
libraries significantly increases the size of the python binary. The
tables below report stripped on-disk binary sizes; the binaries are
stripped to better correspond with the configurations used by system
package managers. We feel this is the most realistic measure of any
change in binary size.

  ---------------------------------------------------------------------
  Binary size   CPython 2.6.4   CPython 3.1.1   Unladen Swallow r1041
  ------------- --------------- --------------- -----------------------
  32-bit        1.3M            1.4M            12M

  64-bit        1.6M            1.6M            12M
  ---------------------------------------------------------------------

The increased binary size is caused by statically linking LLVM's code
generation, analysis and optimization libraries into the python binary.
This can be straightforwardly addressed by modifying LLVM to better
support shared linking and then using that, instead of the current
static linking. For the moment, though, static linking provides an
accurate look at the cost of linking against LLVM.

Even when statically linking, we believe there is still headroom to
improve on-disk binary size by narrowing Unladen Swallow's dependencies
on LLVM. This issue is actively being addressed[44].

Performance Retrospective

Our initial goal for Unladen Swallow was a 5x performance improvement
over CPython 2.6. We did not hit that, nor to put it bluntly, even come
close. Why did the project not hit that goal, and can an LLVM-based JIT
ever hit that goal?

Why did Unladen Swallow not achieve its 5x goal? The primary reason was
that LLVM required more work than we had initially anticipated. Based on
the fact that Apple was shipping products based on LLVM[45], and other
high-level languages had successfully implemented LLVM-based JITs
([46],[47],[48]), we had assumed that LLVM's JIT was relatively free of
show-stopper bugs.

That turned out to be incorrect. We had to turn our attention away from
performance to fix a number of critical bugs in LLVM's JIT
infrastructure (for example,[49],[50]) as well as a number of
nice-to-have enhancements that would enable further optimizations along
various axes (for example,[51], [52],[53]). LLVM's static code
generation facilities, tools and optimization passes are stable and
stress-tested, but the just-in-time infrastructure was relatively
untested and buggy. We have fixed this.

(Our hypothesis is that we hit these problems -- problems other projects
had avoided -- because of the complexity and thoroughness of CPython's
standard library test suite.)

We also diverted engineering effort away from performance and into
support tools such as gdb and oProfile. gdb did not work well with JIT
compilers at all, and LLVM previously had no integration with oProfile.
Having JIT-aware debuggers and profilers has been very valuable to the
project, and we do not regret channeling our time in these directions.
See the Debugging and Profiling sections for more information.

Can an LLVM-based CPython JIT ever hit the 5x performance target? The
benchmark results for JIT-based JavaScript implementations suggest that
5x is indeed possible, as do the results PyPy's JIT has delivered for
numeric workloads. The experience of Self-92[54] is also instructive.

Can LLVM deliver this? We believe that we have only begun to scratch the
surface of what our LLVM-based JIT can deliver. The optimizations we
have incorporated into this system thus far have borne significant fruit
(for example, [55],[56], [57]). Our experience to date is that the
limiting factor on Unladen Swallow's performance is the engineering
cycles needed to implement the literature. We have found LLVM easy to
work with and to modify, and its built-in optimizations have greatly
simplified the task of implementing Python-level optimizations.

An overview of further performance opportunities is discussed in the
Future Work section.

Correctness and Compatibility

Unladen Swallow's correctness test suite includes CPython's test suite
(under Lib/test/), as well as a number of important third-party
applications and libraries[58]. A full list of these applications and
libraries is reproduced below. Any dependencies needed by these
packages, such as zope.interface[59], are also tested indirectly as a
part of testing the primary package, thus widening the corpus of tested
third-party Python code.

-   2to3
-   Cheetah
-   cvs2svn
-   Django
-   Nose
-   NumPy
-   PyCrypto
-   pyOpenSSL
-   PyXML
-   Setuptools
-   SQLAlchemy
-   SWIG
-   SymPy
-   Twisted
-   ZODB

These applications pass all relevant tests when run under Unladen
Swallow. Note that some tests that failed against our baseline of
CPython 2.6.4 were disabled, as were tests that made assumptions about
CPython internals such as exact bytecode numbers or bytecode format. Any
package with disabled tests includes a README.unladen file that details
the changes (for example, [60]).

In addition, Unladen Swallow is tested automatically against an array of
internal Google Python libraries and applications. These include
Google's internal Python bindings for BigTable[61], the Mondrian code
review application[62], and Google's Python standard library, among
others. The changes needed to run these projects under Unladen Swallow
have consistently broken into one of three camps:

-   Adding CPython 2.6 C API compatibility. Since Google still primarily
    uses CPython 2.4 internally, we have needed to convert uses of int
    to Py_ssize_t and similar API changes.
-   Fixing or disabling explicit, incorrect tests of the CPython version
    number.
-   Conditionally disabling code that worked around or depending on bugs
    in CPython 2.4 that have since been fixed.

Testing against this wide range of public and proprietary applications
and libraries has been instrumental in ensuring the correctness of
Unladen Swallow. Testing has exposed bugs that we have duly corrected.
Our automated regression testing regime has given us high confidence in
our changes as we have moved forward.

In addition to third-party testing, we have added further tests to
CPython's test suite for corner cases of the language or implementation
that we felt were untested or underspecified (for example,[63], [64]).
These have been especially important when implementing optimizations,
helping make sure we have not accidentally broken the darker corners of
Python.

We have also constructed a test suite focused solely on the LLVM-based
JIT compiler and the optimizations implemented for it[65]. Because of
the complexity and subtlety inherent in writing an optimizing compiler,
we have attempted to exhaustively enumerate the constructs, scenarios
and corner cases we are compiling and optimizing. The JIT tests also
include tests for things like the JIT hotness model, making it easier
for future CPython developers to maintain and improve.

We have recently begun using fuzz testing[66] to stress-test the
compiler. We have used both pyfuzz[67] and Fusil[68] in the past, and we
recommend they be introduced as an automated part of the CPython testing
process.

Known Incompatibilities

The only application or library we know to not work with Unladen Swallow
that does work with CPython 2.6.4 is Psyco[69]. We are aware of some
libraries such as PyGame[70] that work well with CPython 2.6.4, but
suffer some degradation due to changes made in Unladen Swallow. We are
tracking this issue [71] and are working to resolve these instances of
degradation.

While Unladen Swallow is source-compatible with CPython 2.6.4, it is not
binary compatible. C extension modules compiled against one will need to
be recompiled to work with the other.

The merger of Unladen Swallow should have minimal impact on long-lived
CPython optimization branches like WPython. WPython[72] and Unladen
Swallow are largely orthogonal, and there is no technical reason why
both could not be merged into CPython. The changes needed to make
WPython compatible with a JIT-enhanced version of CPython should be
minimal [73]. The same should be true for other CPython optimization
projects (for example,[74]).

Invasive forks of CPython such as Stackless Python[75] are more
challenging to support. Since Stackless is highly unlikely to be merged
into CPython[76] and an increased maintenance burden is part and parcel
of any fork, we consider compatibility with Stackless to be relatively
low-priority. JIT-compiled stack frames use the C stack, so Stackless
should be able to treat them the same as it treats calls through
extension modules. If that turns out to be unacceptable, Stackless could
either remove the JIT compiler or improve JIT code generation to better
support heap-based stack frames[77],[78].

Platform Support

Unladen Swallow is inherently limited by the platform support provided
by LLVM, especially LLVM's JIT compilation system[79]. LLVM's JIT has
the best support on x86 and x86-64 systems, and these are the platforms
where Unladen Swallow has received the most testing. We are confident in
LLVM/Unladen Swallow's support for x86 and x86-64 hardware. PPC and ARM
support exists, but is not widely used and may be buggy (for
example,[80], [81],[82]).

Unladen Swallow is known to work on the following operating systems:
Linux, Darwin, Windows. Unladen Swallow has received the most testing on
Linux and Darwin, though it still builds and passes its tests on
Windows.

In order to support hardware and software platforms where LLVM's JIT
does not work, Unladen Swallow provides a ./configure --without-llvm
option. This flag carves out any part of Unladen Swallow that depends on
LLVM, yielding a Python binary that works and passes its tests, but has
no performance advantages. This configuration is recommended for
hardware unsupported by LLVM, or systems that care more about memory
usage than performance.

Impact on CPython Development

Experimenting with Changes to Python or CPython Bytecode

Unladen Swallow's JIT compiler operates on CPython bytecode, and as
such, it is immune to Python language changes that affect only the
parser.

We recommend that changes to the CPython bytecode compiler or the
semantics of individual bytecodes be prototyped in the interpreter loop
first, then be ported to the JIT compiler once the semantics are clear.
To make this easier, Unladen Swallow includes a --without-llvm
configure-time option that strips out the JIT compiler and all
associated infrastructure. This leaves the current burden of
experimentation unchanged so that developers can prototype in the
current low-barrier-to-entry interpreter loop.

Unladen Swallow began implementing its JIT compiler by doing
straightforward, naive translations from bytecode implementations into
LLVM API calls. We found this process to be easily understood, and we
recommend the same approach for CPython. We include several sample
changes from the Unladen Swallow repository here as examples of this
style of development:[83],[84], [85],[86].

Debugging

The Unladen Swallow team implemented changes to gdb to make it easier to
use gdb to debug JIT-compiled Python code. These changes were released
in gdb 7.0 [87]. They make it possible for gdb to identify and unwind
past JIT-generated call stack frames. This allows gdb to continue to
function as before for CPython development if one is changing, for
example, the list type or builtin functions.

Example backtrace after our changes, where baz, bar and foo are
JIT-compiled:

    Program received signal SIGSEGV, Segmentation fault.
    0x00002aaaabe7d1a8 in baz ()
    (gdb) bt
    #0 0x00002aaaabe7d1a8 in baz ()
    #1 0x00002aaaabe7d12c in bar ()
    #2 0x00002aaaabe7d0aa in foo ()
    #3 0x00002aaaabe7d02c in main ()
    #4 0x0000000000b870a2 in llvm::JIT::runFunction (this=0x1405b70, F=0x14024e0, ArgValues=...)
    at /home/rnk/llvm-gdb/lib/ExecutionEngine/JIT/JIT.cpp:395
    #5 0x0000000000baa4c5 in llvm::ExecutionEngine::runFunctionAsMain
    (this=0x1405b70, Fn=0x14024e0, argv=..., envp=0x7fffffffe3c0)
    at /home/rnk/llvm-gdb/lib/ExecutionEngine/ExecutionEngine.cpp:377
    #6 0x00000000007ebd52 in main (argc=2, argv=0x7fffffffe3a8,
    envp=0x7fffffffe3c0) at /home/rnk/llvm-gdb/tools/lli/lli.cpp:208

Previously, the JIT-compiled frames would have caused gdb to unwind
incorrectly, generating lots of obviously-incorrect
#6 0x00002aaaabe7d0aa in ?? ()-style stack frames.

Highlights:

-   gdb 7.0 is able to correctly parse JIT-compiled stack frames,
    allowing full use of gdb on non-JIT-compiled functions, that is, the
    vast majority of the CPython codebase.
-   Disassembling inside a JIT-compiled stack frame automatically prints
    the full list of instructions making up that function. This is an
    advance over the state of gdb before our work: developers needed to
    guess the starting address of the function and manually disassemble
    the assembly code.
-   Flexible underlying mechanism allows CPython to add more and more
    information, and eventually reach parity with C/C++ support in gdb
    for JIT-compiled machine code.

Lowlights:

-   gdb cannot print local variables or tell you what line you're
    currently executing inside a JIT-compiled function. Nor can it step
    through JIT-compiled code, except for one instruction at a time.
-   Not yet integrated with Apple's gdb or Microsoft's Visual Studio
    debuggers.

The Unladen Swallow team is working with Apple to get these changes
incorporated into their future gdb releases.

Profiling

Unladen Swallow integrates with oProfile 0.9.4 and newer[88] to support
assembly-level profiling on Linux systems. This means that oProfile will
correctly symbolize JIT-compiled functions in its reports.

Example report, where the #u#-prefixed symbol names are JIT-compiled
Python functions:

    $ opreport -l ./python | less
    CPU: Core 2, speed 1600 MHz (estimated)
    Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
    samples % image name symbol name
    79589 4.2329 python PyString_FromFormatV
    62971 3.3491 python PyEval_EvalCodeEx
    62713 3.3354 python tupledealloc
    57071 3.0353 python _PyEval_CallFunction
    50009 2.6597 24532.jo #u#force_unicode
    47468 2.5246 python PyUnicodeUCS2_Decode
    45829 2.4374 python PyFrame_New
    45173 2.4025 python lookdict_string
    43082 2.2913 python PyType_IsSubtype
    39763 2.1148 24532.jo #u#render5
    38145 2.0287 python _PyType_Lookup
    37643 2.0020 python PyObject_GC_UnTrack
    37105 1.9734 python frame_dealloc
    36849 1.9598 python PyEval_EvalFrame
    35630 1.8950 24532.jo #u#resolve
    33313 1.7717 python PyObject_IsInstance
    33208 1.7662 python PyDict_GetItem
    33168 1.7640 python PyTuple_New
    30458 1.6199 python PyCFunction_NewEx

This support is functional, but as-yet unpolished. Unladen Swallow
maintains a punchlist of items we feel are important to improve in our
oProfile integration to make it more useful to core CPython
developers[89].

Highlights:

-   Symbolization of JITted frames working in oProfile on Linux.

Lowlights:

-   No work yet invested in improving symbolization of JIT-compiled
    frames for Apple's Shark[90] or Microsoft's Visual Studio profiling
    tools.
-   Some polishing still desired for oProfile output.

We recommend using oProfile 0.9.5 (and newer) to work around a now-fixed
bug on x86-64 platforms in oProfile. oProfile 0.9.4 will work fine on
32-bit platforms, however.

Given the ease of integrating oProfile with LLVM[91] and Unladen
Swallow[92], other profiling tools should be easy as well, provided they
support a similar JIT interface[93].

We have documented the process for using oProfile to profile Unladen
Swallow [94]. This document will be merged into CPython's Doc/ tree in
the merge.

Addition of C++ to CPython

In order to use LLVM, Unladen Swallow has introduced C++ into the core
CPython tree and build process. This is an unavoidable part of depending
on LLVM; though LLVM offers a C API[95], it is limited and does not
expose the functionality needed by CPython. Because of this, we have
implemented the internal details of the Unladen Swallow JIT and its
supporting infrastructure in C++. We do not propose converting the
entire CPython codebase to C++.

Highlights:

-   Easy use of LLVM's full, powerful code generation and related APIs.
-   Convenient, abstract data structures simplify code.
-   C++ is limited to relatively small corners of the CPython codebase.
-   C++ can be disabled via ./configure --without-llvm, which even omits
    the dependency on libstdc++.

Lowlights:

-   Developers must know two related languages, C and C++ to work on the
    full range of CPython's internals.
-   A C++ style guide will need to be developed and enforced. PEP 7 will
    be extended[96] to encompass C++ by taking the relevant parts of the
    C++ style guides from Unladen Swallow[97], LLVM [98] and Google[99].
-   Different C++ compilers emit different ABIs; this can cause problems
    if CPython is compiled with one C++ compiler and extensions modules
    are compiled with a different C++ compiler.

Managing LLVM Releases, C++ API Changes

LLVM is released regularly every six months. This means that LLVM may be
released two or three times during the course of development of a
CPython 3.x release. Each LLVM release brings newer and more powerful
optimizations, improved platform support and more sophisticated code
generation.

LLVM releases usually include incompatible changes to the LLVM C++ API;
the release notes for LLVM 2.6[100] include a list of
intentionally-introduced incompatibilities. Unladen Swallow has tracked
LLVM trunk closely over the course of development. Our experience has
been that LLVM API changes are obvious and easily or mechanically
remedied. We include two such changes from the Unladen Swallow tree as
references here: [101],[102].

Due to API incompatibilities, we recommend that an LLVM-based CPython
target compatibility with a single version of LLVM at a time. This will
lower the overhead on the core development team. Pegging to an LLVM
version should not be a problem from a packaging perspective, because
pre-built LLVM packages generally become available via standard system
package managers fairly quickly following an LLVM release, and failing
that, llvm.org itself includes binary releases.

Unladen Swallow has historically included a copy of the LLVM and Clang
source trees in the Unladen Swallow tree; this was done to allow us to
closely track LLVM trunk as we made patches to it. We do not recommend
this model of development for CPython. CPython releases should be based
on official LLVM releases. Pre-built LLVM packages are available from
MacPorts[103] for Darwin, and from most major Linux distributions
([104], [105],[106]). LLVM itself provides additional binaries, such as
for MinGW[107].

LLVM is currently intended to be statically linked; this means that
binary releases of CPython will include the relevant parts (not all!) of
LLVM. This will increase the binary size, as noted above. To simplify
downstream package management, we will modify LLVM to better support
shared linking. This issue will block final merger[108].

Unladen Swallow has tasked a full-time engineer with fixing any
remaining critical issues in LLVM before LLVM's 2.7 release. We consider
it essential that CPython 3.x be able to depend on a released version of
LLVM, rather than closely tracking LLVM trunk as Unladen Swallow has
done. We believe we will finish this work[109] before the release of
LLVM 2.7, expected in May 2010.

Building CPython

In addition to a runtime dependency on LLVM, Unladen Swallow includes a
build-time dependency on Clang[110], an LLVM-based C/C++ compiler. We
use this to compile parts of the C-language Python runtime to LLVM's
intermediate representation; this allows us to perform cross-language
inlining, yielding increased performance. Clang is not required to run
Unladen Swallow. Clang binary packages are available from most major
Linux distributions (for example, [111]).

We examined the impact of Unladen Swallow on the time needed to build
Python, including configure, full builds and incremental builds after
touching a single C source file.

  --------------------------------------------------------------------
  ./configure   CPython 2.6.4   CPython 3.1.1   Unladen Swallow r988
  ------------- --------------- --------------- ----------------------
  Run 1         0m20.795s       0m16.558s       0m15.477s

  Run 2         0m15.255s       0m16.349s       0m15.391s

  Run 3         0m15.228s       0m16.299s       0m15.528s
  --------------------------------------------------------------------

  --------------------------------------------------------------------
  Full make     CPython 2.6.4   CPython 3.1.1   Unladen Swallow r988
  ------------- --------------- --------------- ----------------------
  Run 1         1m30.776s       1m22.367s       1m54.053s

  Run 2         1m21.374s       1m22.064s       1m49.448s

  Run 3         1m22.047s       1m23.645s       1m49.305s
  --------------------------------------------------------------------

Full builds take a hit due to a) additional .cc files needed for LLVM
interaction, b) statically linking LLVM into libpython, c) compiling
parts of the Python runtime to LLVM IR to enable cross-language
inlining.

Incremental builds are also somewhat slower than mainline CPython. The
table below shows incremental rebuild times after touching
Objects/listobject.c.

  ---------------------------------------------------------------------
  Incr make     CPython 2.6.4   CPython 3.1.1   Unladen Swallow r1024
  ------------- --------------- --------------- -----------------------
  Run 1         0m1.854s        0m1.456s        0m6.680s

  Run 2         0m1.437s        0m1.442s        0m5.310s

  Run 3         0m1.440s        0m1.425s        0m7.639s
  ---------------------------------------------------------------------

As with full builds, this extra time comes from statically linking LLVM
into libpython. If libpython were linked shared against LLVM, this
overhead would go down.

Proposed Merge Plan

We propose focusing our efforts on eventual merger with CPython's 3.x
line of development. The BDFL has indicated that 2.7 is to be the final
release of CPython's 2.x line of development[112], and since 2.7 alpha 1
has already been released <373>, we have missed the window. Python 3 is
the future, and that is where we will target our performance efforts.

We recommend the following plan for merger of Unladen Swallow into the
CPython source tree:

-   Creation of a branch in the CPython SVN repository to work in, call
    it py3k-jit as a strawman. This will be a branch of the CPython py3k
    branch.
-   We will keep this branch closely integrated to py3k. The further we
    deviate, the harder our work will be.
-   Any JIT-related patches will go into the py3k-jit branch.
-   Non-JIT-related patches will go into the py3k branch (once reviewed
    and approved) and be merged back into the py3k-jit branch.
-   Potentially-contentious issues, such as the introduction of new
    command line flags or environment variables, will be discussed on
    python-dev.

Because Google uses CPython 2.x internally, Unladen Swallow is based on
CPython 2.6. We would need to port our compiler to Python 3; this would
be done as patches are applied to the py3k-jit branch, so that the
branch remains a consistent implementation of Python 3 at all times.

We believe this approach will be minimally disruptive to the 3.2 or 3.3
release process while we iron out any remaining issues blocking final
merger into py3k. Unladen Swallow maintains a punchlist of known issues
needed before final merger[113], which includes all problems mentioned
in this PEP; we trust the CPython community will have its own concerns.
This punchlist is not static; other issues may emerge in the future that
will block final merger into the py3k branch.

Changes will be committed directly to the py3k-jit branch, with only
large, tricky or controversial changes sent for pre-commit code review.

Contingency Plans

There is a chance that we will not be able to reduce memory usage or
startup time to a level satisfactory to the CPython community. Our
primary contingency plan for this situation is to shift from an online
just-in-time compilation strategy to an offline ahead-of-time strategy
using an instrumented CPython interpreter loop to obtain feedback. This
is the same model used by gcc's feedback-directed optimizations
(-fprofile-generate)[114] and Microsoft Visual Studio's profile-guided
optimizations[115]; we will refer to this as "feedback-directed
optimization" here, or FDO.

We believe that an FDO compiler for Python would be inferior to a JIT
compiler. FDO requires a high-quality, representative benchmark suite,
which is a relative rarity in both open- and closed-source development.
A JIT compiler can dynamically find and optimize the hot spots in any
application -- benchmark suite or no -- allowing it to adapt to changes
in application bottlenecks without human intervention.

If an ahead-of-time FDO compiler is required, it should be able to
leverage a large percentage of the code and infrastructure already
developed for Unladen Swallow's JIT compiler. Indeed, these two
compilation strategies could exist side by side.

Future Work

A JIT compiler is an extremely flexible tool, and we have by no means
exhausted its full potential. Unladen Swallow maintains a list of
yet-to-be-implemented performance optimizations[116] that the team has
not yet had time to fully implement. Examples:

-   Python/Python inlining[117]. Our compiler currently performs no
    inlining between pure-Python functions. Work on this is on-going
    [118].
-   Unboxing[119]. Unboxing is critical for numerical performance. PyPy
    in particular has demonstrated the value of unboxing to heavily
    numeric workloads.
-   Recompilation, adaptation. Unladen Swallow currently only compiles a
    Python function once, based on its usage pattern up to that point.
    If the usage pattern changes, limitations in LLVM[120] prevent us
    from recompiling the function to better serve the new usage pattern.
-   JIT-compile regular expressions. Modern JavaScript engines reuse
    their JIT compilation infrastructure to boost regex
    performance[121]. Unladen Swallow has developed benchmarks for
    Python regular expression performance ([122],[123],[124]), but work
    on regex performance is still at an early stage[125].
-   Trace compilation[126],[127]. Based on the results of PyPy and
    Tracemonkey[128], we believe that a CPython JIT should incorporate
    trace compilation to some degree. We initially avoided a
    purely-tracing JIT compiler in favor of a simpler,
    function-at-a-time compiler. However this function-at-a-time
    compiler has laid the groundwork for a future tracing compiler
    implemented in the same terms.
-   Profile generation/reuse. The runtime data gathered by the JIT could
    be persisted to disk and reused by subsequent JIT compilations, or
    by external tools such as Cython[129] or a feedback-enhanced code
    coverage tool.

This list is by no means exhaustive. There is a vast literature on
optimizations for dynamic languages that could and should be implemented
in terms of Unladen Swallow's LLVM-based JIT compiler[130].

Unladen Swallow Community

We would like to thank the community of developers who have contributed
to Unladen Swallow, in particular: James Abbatiello, Joerg Blank, Eric
Christopher, Alex Gaynor, Chris Lattner, Nick Lewycky, Evan Phoenix and
Thomas Wouters.

Licensing

All work on Unladen Swallow is licensed to the Python Software
Foundation (PSF) under the terms of the Python Software Foundation
License v2[131] under the umbrella of Google's blanket Contributor
License Agreement with the PSF.

LLVM is licensed[132] under the University of llinois/NCSA Open Source
License[133], a liberal, OSI-approved license. The University of
Illinois Urbana-Champaign is the sole copyright holder for LLVM.

References

Copyright

This document has been placed in the public domain.

[1] http://qinsb.blogspot.com/2011/03/unladen-swallow-retrospective.html

[2] http://en.wikipedia.org/wiki/Dead_Parrot_sketch

[3] http://code.google.com/p/unladen-swallow/

[4] http://code.google.com/p/unladen-swallow/issues/list?q=label:Merger

[5] http://en.wikipedia.org/wiki/Just-in-time_compilation

[6] http://research.sun.com/self/papers/urs-thesis.html

[7] http://code.google.com/p/v8/

[8] http://webkit.org/blog/214/introducing-squirrelfish-extreme/

[9] http://en.wikipedia.org/wiki/HotSpot

[10] http://rubini.us/

[11] http://www.macruby.org/

[12] http://psyco.sourceforge.net/

[13] http://code.google.com/p/unladen-swallow/wiki/RelevantPapers

[14] http://code.google.com/p/unladen-swallow/source/browse/trunk/Python/llvm_notes.txt

[15] http://llvm.org/

[16] http://llvm.org/docs/LangRef.html

[17] http://code.google.com/p/unladen-swallow/wiki/ProjectPlan

[18] http://code.google.com/p/unladen-swallow/source/browse/trunk/Python/llvm_notes.txt

[19] http://www.cython.org/

[20] http://shed-skin.blogspot.com/

[21] http://shedskin.googlecode.com/files/shedskin-tutorial-0.3.html

[22] http://ironpython.net/

[23] http://www.mono-project.com/

[24] http://www.jython.org/

[25] http://wiki.python.org/jython/JythonFaq/GeneralInfo

[26] http://psyco.sourceforge.net/

[27] http://codespeak.net/pypy/dist/pypy/doc/

[28] http://code.google.com/p/pyv8/

[29] http://code.google.com/p/wpython/

[30] http://www.mail-archive.com/python-dev@python.org/msg45143.html

[31] http://code.google.com/p/unladen-swallow/wiki/Benchmarks

[32] http://codespeak.net:8099/plotsummary.html

[33] http://code.google.com/p/unladen-swallow/issues/detail?id=120

[34] http://numpy.scipy.org/

[35] http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_nbody.py

[36] http://en.wikipedia.org/wiki/Student's_t-test

[37] http://code.google.com/p/unladen-swallow/wiki/Benchmarks

[38] http://code.google.com/p/unladen-swallow/source/browse/branches/background-thread

[39] http://code.google.com/p/unladen-swallow/issues/detail?id=40

[40] http://code.google.com/p/unladen-swallow/issues/list?q=label:Performance

[41] http://bmaurer.blogspot.com/2006/03/memory-usage-with-smaps.html

[42] http://code.google.com/p/unladen-swallow/issues/detail?id=68

[43] http://code.google.com/p/unladen-swallow/issues/detail?id=64

[44] http://code.google.com/p/unladen-swallow/issues/detail?id=118

[45] http://llvm.org/Users.html

[46] http://rubini.us/

[47] http://www.macruby.org/

[48] http://www.ffconsultancy.com/ocaml/hlvm/

[49] http://llvm.org/PR5201

[50] http://llvm.org/viewvc/llvm-project?view=rev&revision=76828

[51] http://llvm.org/viewvc/llvm-project?rev=85182&view=rev

[52] http://llvm.org/viewvc/llvm-project?rev=91611&view=rev

[53] http://llvm.org/PR5735

[54] http://research.sun.com/self/papers/urs-thesis.html

[55] http://code.google.com/p/unladen-swallow/issues/detail?id=73

[56] http://code.google.com/p/unladen-swallow/issues/detail?id=88

[57] http://code.google.com/p/unladen-swallow/issues/detail?id=67

[58] http://code.google.com/p/unladen-swallow/wiki/Testing

[59] http://www.zope.org/Products/ZopeInterface

[60] http://code.google.com/p/unladen-swallow/source/browse/tests/lib/sqlalchemy/README.unladen

[61] http://en.wikipedia.org/wiki/BigTable

[62] http://www.niallkennedy.com/blog/2006/11/google-mondrian.html

[63] http://code.google.com/p/unladen-swallow/source/detail?r=888

[64] http://code.google.com/p/unladen-swallow/source/diff?spec=svn576&r=576&format=side&path=/trunk/Lib/test/test_trace.py

[65] http://code.google.com/p/unladen-swallow/source/browse/trunk/Lib/test/test_llvm.py

[66] http://en.wikipedia.org/wiki/Fuzz_testing

[67] http://bitbucket.org/ebo/pyfuzz/overview/

[68] http://lwn.net/Articles/322826/

[69] http://psyco.sourceforge.net/

[70] http://www.pygame.org/

[71] http://code.google.com/p/unladen-swallow/issues/detail?id=40

[72] http://code.google.com/p/wpython/

[73] http://www.mail-archive.com/python-dev@python.org/msg44962.html

[74] http://portal.acm.org/citation.cfm?id=1534530.1534550

[75] http://www.stackless.com/

[76] https://mail.python.org/pipermail/python-dev/2004-June/045165.html

[77] http://www.nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt

[78] http://old.nabble.com/LLVM-and-coroutines-microthreads-td23080883.html

[79] http://llvm.org/docs/GettingStarted.html#hardware

[80] http://llvm.org/PR4816

[81] http://llvm.org/PR5201

[82] http://llvm.org/PR6065

[83] http://code.google.com/p/unladen-swallow/source/detail?r=359

[84] http://code.google.com/p/unladen-swallow/source/detail?r=376

[85] http://code.google.com/p/unladen-swallow/source/detail?r=417

[86] http://code.google.com/p/unladen-swallow/source/detail?r=517

[87] http://www.gnu.org/software/gdb/download/ANNOUNCEMENT

[88] http://oprofile.sourceforge.net/news/

[89] http://code.google.com/p/unladen-swallow/issues/detail?id=63

[90] http://developer.apple.com/tools/sharkoptimize.html

[91] http://llvm.org/viewvc/llvm-project?view=rev&revision=75279

[92] http://code.google.com/p/unladen-swallow/source/detail?r=986

[93] http://oprofile.sourceforge.net/doc/devel/jit-interface.html

[94] http://code.google.com/p/unladen-swallow/wiki/UsingOProfile

[95] http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/

[96] http://www.mail-archive.com/python-dev@python.org/msg45544.html

[97] http://code.google.com/p/unladen-swallow/wiki/StyleGuide

[98] http://llvm.org/docs/CodingStandards.html

[99] http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml

[100] http://llvm.org/releases/2.6/docs/ReleaseNotes.html#whatsnew

[101] http://code.google.com/p/unladen-swallow/source/detail?r=820

[102] http://code.google.com/p/unladen-swallow/source/detail?r=532

[103] http://trac.macports.org/browser/trunk/dports/lang/llvm/Portfile

[104] http://packages.ubuntu.com/karmic/llvm

[105] http://packages.debian.org/unstable/devel/llvm

[106] http://koji.fedoraproject.org/koji/buildinfo?buildID=134384

[107] http://llvm.org/releases/download.html

[108] http://code.google.com/p/unladen-swallow/issues/detail?id=130

[109] http://code.google.com/p/unladen-swallow/issues/detail?id=131

[110] http://clang.llvm.org/

[111] http://packages.debian.org/sid/clang

[112] https://mail.python.org/pipermail/python-dev/2010-January/095682.html

[113] http://code.google.com/p/unladen-swallow/issues/list?q=label:Merger

[114] http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

[115] http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx

[116] http://code.google.com/p/unladen-swallow/issues/list?q=label:Performance

[117] http://en.wikipedia.org/wiki/Inline_expansion

[118] http://code.google.com/p/unladen-swallow/issues/detail?id=86

[119] http://en.wikipedia.org/wiki/Object_type_(object-oriented_programming%29

[120] http://code.google.com/p/unladen-swallow/issues/detail?id=41

[121] http://code.google.com/p/unladen-swallow/wiki/ProjectPlan#Regular_Expressions

[122] http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_regex_compile.py

[123] http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_regex_v8.py

[124] http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_regex_effbot.py

[125] http://code.google.com/p/unladen-swallow/issues/detail?id=13

[126] http://www.ics.uci.edu/~franz/Site/pubs-pdf/C44Prepub.pdf

[127] http://www.ics.uci.edu/~franz/Site/pubs-pdf/ICS-TR-07-12.pdf

[128] https://wiki.mozilla.org/JavaScript:TraceMonkey

[129] http://www.cython.org/

[130] http://code.google.com/p/unladen-swallow/wiki/RelevantPapers

[131] http://www.python.org/psf/license/

[132] http://llvm.org/docs/DeveloperPolicy.html#clp

[133] http://www.opensource.org/licenses/UoI-NCSA.php