PEP: 3121 Title: Extension Module Initialization and Finalization
Version: $Revision$ Last-Modified: $Date$ Author: Martin von Löwis
<martin@v.loewis.de> Status: Final Type: Standards Track Content-Type:
text/x-rst Created: 27-Apr-2007 Python-Version: 3.0 Post-History:

PyInit_modulename and PyModuleDef

Abstract

Extension module initialization currently has a few deficiencies. There
is no cleanup for modules, the entry point name might give naming
conflicts, the entry functions don't follow the usual calling
convention, and multiple interpreters are not supported well. This PEP
addresses these issues.

Problems

Module Finalization

Currently, extension modules are initialized usually once and then
"live" forever. The only exception is when Py_Finalize() is called: then
the initialization routine is invoked a second time. This is bad from a
resource management point of view: memory and other resources might get
allocated each time initialization is called, but there is no way to
reclaim them. As a result, there is currently no way to completely
release all resources Python has allocated.

Entry point name conflicts

The entry point is currently called init<module>. This might conflict
with other symbols also called init<something>. In particular,
initsocket is known to have conflicted in the past (this specific
problem got resolved as a side effect of renaming the module to
_socket).

Entry point signature

The entry point is currently a procedure (returning void). This deviates
from the usual calling conventions; callers can find out whether there
was an error during initialization only by checking PyErr_Occurred. The
entry point should return a PyObject*, which will be the module created,
or NULL in case of an exception.

Multiple Interpreters

Currently, extension modules share their state across all interpreters.
This allows for undesirable information leakage across interpreters: one
script could permanently corrupt objects in an extension module,
possibly breaking all scripts in other interpreters.

Specification

The module initialization routines change their signature to:

    PyObject *PyInit_<modulename>()

The initialization routine will be invoked once per interpreter, when
the module is imported. It should return a new module object each time.

In order to store per-module state in C variables, each module object
will contain a block of memory that is interpreted only by the module.
The amount of memory used for the module is specified at the point of
creation of the module.

In addition to the initialization function, a module may implement a
number of additional callback functions, which are invoked when the
module's tp_traverse, tp_clear, and tp_free functions are invoked, and
when the module is reloaded.

The entire module definition is combined in a struct PyModuleDef:

    struct PyModuleDef{
      PyModuleDef_Base m_base;  /* To be filled out by the interpreter */
      Py_ssize_t m_size; /* Size of per-module data */
      PyMethodDef *m_methods;
      inquiry m_reload;
      traverseproc m_traverse;
      inquiry m_clear;
      freefunc m_free;
    };

Creation of a module is changed to expect an optional PyModuleDef*. The
module state will be null-initialized.

Each module method will be passed the module object as the first
parameter. To access the module data, a function:

    void* PyModule_GetState(PyObject*);

will be provided. In addition, to lookup a module more efficiently than
going through sys.modules, a function:

    PyObject* PyState_FindModule(struct PyModuleDef*);

will be provided. This lookup function will use an index located in the
m_base field, to find the module by index, not by name.

As all Python objects should be controlled through the Python memory
management, usage of "static" type objects is discouraged, unless the
type object itself has no memory-managed state. To simplify definition
of heap types, a new method:

    PyTypeObject* PyType_Copy(PyTypeObject*);

is added.

Example

xxmodule.c would be changed to remove the initxx function, and add the
following code instead:

    struct xxstate{
      PyObject *ErrorObject;
      PyObject *Xxo_Type;
    };

    #define xxstate(o) ((struct xxstate*)PyModule_GetState(o))

    static int xx_traverse(PyObject *m, visitproc v,
                           void *arg)
    {
      Py_VISIT(xxstate(m)->ErrorObject);
      Py_VISIT(xxstate(m)->Xxo_Type);
      return 0;
    }

    static int xx_clear(PyObject *m)
    {
      Py_CLEAR(xxstate(m)->ErrorObject);
      Py_CLEAR(xxstate(m)->Xxo_Type);
      return 0;
    }

    static struct PyModuleDef xxmodule = {
      {}, /* m_base */
      sizeof(struct xxstate),
      &xx_methods,
      0,  /* m_reload */
      xx_traverse,
      xx_clear,
      0,  /* m_free - not needed, since all is done in m_clear */
    }

    PyObject*
    PyInit_xx()
    {
      PyObject *res = PyModule_New("xx", &xxmodule);
      if (!res) return NULL;
      xxstate(res)->ErrorObject = PyErr_NewException("xx.error", NULL, NULL);
      if (!xxstate(res)->ErrorObject) {
        Py_DECREF(res);
        return NULL;
      }
      xxstate(res)->XxoType = PyType_Copy(&Xxo_Type);
      if (!xxstate(res)->Xxo_Type) {
        Py_DECREF(res);
        return NULL;
      }
      return res;
    }

Discussion

Tim Peters reports in[1] that PythonLabs considered such a feature at
one point, and lists the following additional hooks which aren't
currently supported in this PEP:

-   when the module object is deleted from sys.modules
-   when Py_Finalize is called
-   when Python exits
-   when the Python DLL is unloaded (Windows only)

References

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

[1] Tim Peters, reporting earlier conversation about such a feature
https://mail.python.org/pipermail/python-3000/2006-April/000726.html