PEP: 399 Title: Pure Python/C Accelerator Module Compatibility
Requirements Version: $Revision: 88219 $ Last-Modified: $Date:
2011-01-27 13:47:00 -0800 (Thu, 27 Jan 2011) $ Author: Brett Cannon
<brett@python.org> Status: Final Type: Informational Content-Type:
text/x-rst Created: 04-Apr-2011 Python-Version: 3.3 Post-History:
04-Apr-2011, 12-Apr-2011, 17-Jul-2011, 15-Aug-2011, 01-Jan-2013

Abstract

The Python standard library under CPython contains various instances of
modules implemented in both pure Python and C (either entirely or
partially). This PEP requires that in these instances that the C code
must pass the test suite used for the pure Python code so as to act as
much as a drop-in replacement as reasonably possible (C- and VM-specific
tests are exempt). It is also required that new C-based modules lacking
a pure Python equivalent implementation get special permission to be
added to the standard library.

Rationale

Python has grown beyond the CPython virtual machine (VM). IronPython,
Jython, and PyPy are all currently viable alternatives to the CPython
VM. The VM ecosystem that has sprung up around the Python programming
language has led to Python being used in many different areas where
CPython cannot be used, e.g., Jython allowing Python to be used in Java
applications.

A problem all of the VMs other than CPython face is handling modules
from the standard library that are implemented (to some extent) in C.
Since other VMs do not typically support the entire C API of CPython
they are unable to use the code used to create the module. Oftentimes
this leads these other VMs to either re-implement the modules in pure
Python or in the programming language used to implement the VM itself
(e.g., in C# for IronPython). This duplication of effort between
CPython, PyPy, Jython, and IronPython is extremely unfortunate as
implementing a module at least in pure Python would help mitigate this
duplicate effort.

The purpose of this PEP is to minimize this duplicate effort by
mandating that all new modules added to Python's standard library must
have a pure Python implementation unless special dispensation is given.
This makes sure that a module in the stdlib is available to all VMs and
not just to CPython (pre-existing modules that do not meet this
requirement are exempt, although there is nothing preventing someone
from adding in a pure Python implementation retroactively).

Re-implementing parts (or all) of a module in C (in the case of CPython)
is still allowed for performance reasons, but any such accelerated code
must pass the same test suite (sans VM- or C-specific tests) to verify
semantics and prevent divergence. To accomplish this, the test suite for
the module must have comprehensive coverage of the pure Python
implementation before the acceleration code may be added.

Details

Starting in Python 3.3, any modules added to the standard library must
have a pure Python implementation. This rule can only be ignored if the
Python development team grants a special exemption for the module.
Typically the exemption will be granted only when a module wraps a
specific C-based library (e.g., sqlite3). In granting an exemption it
will be recognized that the module will be considered exclusive to
CPython and not part of Python's standard library that other VMs are
expected to support. Usage of ctypes to provide an API for a C library
will continue to be frowned upon as ctypes lacks compiler guarantees
that C code typically relies upon to prevent certain errors from
occurring (e.g., API changes).

Even though a pure Python implementation is mandated by this PEP, it
does not preclude the use of a companion acceleration module. If an
acceleration module is provided it is to be named the same as the module
it is accelerating with an underscore attached as a prefix, e.g.,
_warnings for warnings. The common pattern to access the accelerated
code from the pure Python implementation is to import it with an
import *, e.g., from _warnings import *. This is typically done at the
end of the module to allow it to overwrite specific Python objects with
their accelerated equivalents. This kind of import can also be done
before the end of the module when needed, e.g., an accelerated base
class is provided but is then subclassed by Python code. This PEP does
not mandate that pre-existing modules in the stdlib that lack a pure
Python equivalent gain such a module. But if people do volunteer to
provide and maintain a pure Python equivalent (e.g., the PyPy team
volunteering their pure Python implementation of the csv module and
maintaining it) then such code will be accepted. In those instances the
C version is considered the reference implementation in terms of
expected semantics.

Any new accelerated code must act as a drop-in replacement as close to
the pure Python implementation as reasonable. Technical details of the
VM providing the accelerated code are allowed to differ as necessary,
e.g., a class being a type when implemented in C. To verify that the
Python and equivalent C code operate as similarly as possible, both code
bases must be tested using the same tests which apply to the pure Python
code (tests specific to the C code or any VM do not follow under this
requirement). The test suite is expected to be extensive in order to
verify expected semantics.

Acting as a drop-in replacement also dictates that no public API be
provided in accelerated code that does not exist in the pure Python
code. Without this requirement people could accidentally come to rely on
a detail in the accelerated code which is not made available to other
VMs that use the pure Python implementation. To help verify that the
contract of semantic equivalence is being met, a module must be tested
both with and without its accelerated code as thoroughly as possible.

As an example, to write tests which exercise both the pure Python and C
accelerated versions of a module, a basic idiom can be followed:

    from test.support import import_fresh_module
    import unittest

    c_heapq = import_fresh_module('heapq', fresh=['_heapq'])
    py_heapq = import_fresh_module('heapq', blocked=['_heapq'])


    class ExampleTest:

        def test_example(self):
            self.assertTrue(hasattr(self.module, 'heapify'))


    class PyExampleTest(ExampleTest, unittest.TestCase):
        module = py_heapq


    @unittest.skipUnless(c_heapq, 'requires the C _heapq module')
    class CExampleTest(ExampleTest, unittest.TestCase):
        module = c_heapq


    if __name__ == '__main__':
        unittest.main()

The test module defines a base class (ExampleTest) with test methods
that access the heapq module through a self.heapq class attribute, and
two subclasses that set this attribute to either the Python or the C
version of the module. Note that only the two subclasses inherit from
unittest.TestCase -- this prevents the ExampleTest class from being
detected as a TestCase subclass by unittest test discovery. A skipUnless
decorator can be added to the class that tests the C code in order to
have these tests skipped when the C module is not available.

If this test were to provide extensive coverage for heapq.heappop() in
the pure Python implementation then the accelerated C code would be
allowed to be added to CPython's standard library. If it did not, then
the test suite would need to be updated until proper coverage was
provided before the accelerated C code could be added.

To also help with compatibility, C code should use abstract APIs on
objects to prevent accidental dependence on specific types. For
instance, if a function accepts a sequence then the C code should
default to using PyObject_GetItem() instead of something like
PyList_GetItem(). C code is allowed to have a fast path if the proper
PyList_CheckExact() is used, but otherwise APIs should work with any
object that duck types to the proper interface instead of a specific
type.

Copyright

This document has been placed in the public domain.