PEP: 534 Title: Improved Errors for Missing Standard Library Modules
Author: Tomáš Orsava <tomas.n@orsava.cz>, Petr Viktorin
<encukou@gmail.com>, Alyssa Coghlan <ncoghlan@gmail.com> Status:
Deferred Type: Standards Track Content-Type: text/x-rst Created:
05-Sep-2016 Post-History:

Abstract

Python is often being built or distributed without its full standard
library. However, there is as of yet no standard, user friendly way of
properly informing the user about the failure to import such missing
standard library modules.

This PEP proposes a mechanism for identifying expected standard library
modules and providing more informative error messages to users when
attempts to import standard library modules fail.

PEP Deferral

The PEP authors aren't actively working on this PEP, so if improving
these error messages is an idea that you're interested in pursuing,
please get in touch! (e.g. by posting to the python-dev mailing list).

The key piece of open work is determining how to get the autoconf and
Visual Studio build processes to populate the sysconfig metadata file
with the lists of expected and optional standard library modules.

Motivation

There are several use cases for including only a subset of Python's
standard library. However, there is so far no user-friendly mechanism
for informing the user why a stdlib module is missing and how to remedy
the situation appropriately.

CPython

When one of Python's standard library modules (such as _sqlite3) cannot
be compiled during a CPython build because of missing dependencies (e.g.
SQLite header files), the module is simply skipped. If you then install
this compiled Python and use it to try to import one of the missing
modules, Python will fail with a ModuleNotFoundError.

For example, after deliberately removing sqlite-devel from the local
system:

    $ ./python -c "import sqlite3"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/ncoghlan/devel/cpython/Lib/sqlite3/__init__.py", line 23, in <module>
        from sqlite3.dbapi2 import *
      File "/home/ncoghlan/devel/cpython/Lib/sqlite3/dbapi2.py", line 27, in <module>
        from _sqlite3 import *
    ModuleNotFoundError: No module named '_sqlite3'

This can confuse users who may not understand why a cleanly built Python
is missing standard library modules.

Linux and other distributions

Many Linux and other distributions are already separating out parts of
the standard library to standalone packages. Among the most commonly
excluded modules are the tkinter module, since it draws in a dependency
on the graphical environment, idlelib, since it depends on tkinter (and
most Linux desktop environments provide their own default code editor),
and the test package, as it only serves to test Python internally and is
about as big as the rest of the standard library put together.

The methods of omission of these modules differ. For example, Debian
patches the file Lib/tkinter/__init__.py to envelop the line
import _tkinter in a try-except block and upon encountering an
ImportError it simply adds the following to the error message:
please install the python3-tk package [1]. Fedora and other
distributions simply don't include the omitted modules, potentially
leaving users baffled as to where to find them.

An example from Fedora 29:

    $ python3 -c "import tkinter"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    ModuleNotFoundError: No module named 'tkinter'

Specification

APIs to list expected standard library modules

To allow for easier identification of which module names are expected to
be resolved in the standard library, the sysconfig module will be
extended with two additional functions:

-   sysconfig.get_stdlib_modules(), which will provide a list of the
    names of all top level Python standard library modules (including
    private modules)
-   sysconfig.get_optional_modules(), which will list optional public
    top level standard library module names

The results of sysconfig.get_optional_modules() and the existing
sys.builtin_module_names will both be subsets of the full list provided
by the new sysconfig.get_stdlib_modules() function.

These added lists will be generated during the Python build process and
saved in the _sysconfigdata-*.py file along with other sysconfig values.

Possible reasons for modules being in the "optional" list will be:

-   the module relies on an optional build dependency (e.g. _sqlite3,
    tkinter, idlelib)
-   the module is private for other reasons and hence may not be present
    on all implementations (e.g. _freeze_importlib, _collections_abc)
-   the module is platform specific and hence may not be present in all
    installations (e.g. winreg)
-   the test package may also be freely omitted from Python runtime
    installations, as it is intended for use in testing Python
    implementations, not as a runtime library for Python projects to use
    (the public API offering testing utilities is unittest)

(Note: the ensurepip, venv, and distutils modules are all considered
mandatory modules in this PEP, even though not all redistributors
currently adhere to that practice)

Changes to the default sys.excepthook implementation

The default implementation of the sys.excepthook function will then be
modified to dispense an appropriate message when it detects a failure to
import a module identified by one of the two new sysconfig functions as
belonging to the Python standard library.

Revised error message for a module that relies on an optional build
dependency or is otherwise considered optional when Python is installed:

    $ ./python -c "import sqlite3"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/ncoghlan/devel/cpython/Lib/sqlite3/__init__.py", line 23, in <module>
        from sqlite3.dbapi2 import *
      File "/home/ncoghlan/devel/cpython/Lib/sqlite3/dbapi2.py", line 27, in <module>
        from _sqlite3 import *
    ModuleNotFoundError: Optional standard library module '_sqlite3' was not found

Revised error message for a submodule of an optional top level package
when the entire top level package is missing:

    $ ./python -c "import test.regrtest"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    ModuleNotFoundError: Optional standard library module 'test' was not found

Revised error message for a submodule of an optional top level package
when the top level package is present:

    $ ./python -c "import test.regrtest"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    ModuleNotFoundError: No submodule named 'test.regrtest' in optional standard library module 'test'

Revised error message for a module that is always expected to be
available:

    $ ./python -c "import ensurepip"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    ModuleNotFoundError: Standard library module 'ensurepip' was not found

Revised error message for a missing submodule of a standard library
package when the top level package is present:

    $ ./python -c "import encodings.mbcs"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    ModuleNotFoundError: No submodule named 'encodings.mbcs' in standard library module 'encodings'

These revised error messages make it clear that the missing modules are
expected to be available from the standard library, but are not
available for some reason, rather than being an indicator of a missing
third party dependency in the current environment.

Design Discussion

Modifying sys.excepthook

The sys.excepthook function gets called when a raised exception is
uncaught and the program is about to exit or (in an interactive session)
the control is being returned to the prompt. This makes it a perfect
place for customized error messages, as it will not influence caught
errors and thus not slow down normal execution of Python scripts.

Public API to query expected standard library module names

The inclusion of the functions sysconfig.get_stdlib_modules() and
sysconfig.get_optional_modules() will provide a long sought-after way of
easily listing the names of Python standard library modules [2], which
will (among other benefits) make it easier for code analysis, profiling,
and error reporting tools to offer runtime --ignore-stdlib flags.

Only including top level module names

This PEP proposes that only top level module and package names be
reported by the new query APIs. This is sufficient information to
generate the proposed error messages, reduces the number of required
entries by an order of magnitude, and simplifies the process of
generating the related metadata during the build process.

If this is eventually found to be overly limiting, a new
include_submodules flag could be added to the query APIs. However, this
is not part of the initial proposal, as the benefits of doing so aren't
currently seen as justifying the extra complexity.

There is one known consequence of this restriction, which is that the
new default excepthook implementation will report incorrect submodules
names the same way that it reports genuinely missing standard library
submodules:

    $ ./python -c "import unittest.muck"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    ModuleNotFoundError: No submodule named 'unittest.muck' in standard library module 'unittest'

Listing private top level module names as optional standard library modules

Many of the modules that have an optional external build dependency are
written as hybrid modules, where there is a shared Python wrapper around
an implementation dependent interface to the underlying external
library. In other cases, a private top level module may simply be a
CPython implementation detail, and other implementations may not provide
that module at all.

To report import errors involving these modules appropriately, the new
default excepthook implementation needs them to be reported by the new
query APIs.

Deeming packaging related modules to be mandatory

Some redistributors aren't entirely keen on installing the Python
specific packaging related modules (distutils, ensurepip, venv) by
default, preferring that developers use their platform specific tooling
instead.

This approach causes interoperability problems for developers working on
cross-platform projects and educators attempting to write platform
independent setup instructions, so this PEP takes the view that these
modules should be considered mandatory, and left out of the list of
optional modules.

Deferred Ideas

The ideas in this section are concepts that this PEP would potentially
help enable, but they're considered out of scope for the initial
proposal.

Platform dependent modules

Some standard library modules may be missing because they're only
provided on particular platforms. For example, the winreg module is only
available on Windows:

    $ python3 -c "import winreg"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    ModuleNotFoundError: No module named 'winreg'

In the current proposal, these platform dependent modules will simply be
included with all the other optional modules rather than attempting to
expose the platform dependency information in a more structured way.

However, the platform dependence is at least tracked at the level of
"Windows", "Unix", "Linux", and "FreeBSD" for the benefit of the
documentation, so it seems plausible that it could potentially be
exposed programmatically as well.

Emitting a warning when __main__ shadows a standard library module

Given the new query APIs, the new default excepthook implementation
could potentially detect when __main__.__file__ or
__main__.__spec__.name match a standard library module, and emit a
suitable warning.

However, actually doing anything along this lines should review more
cases where uses actually encounter this problem, and the various
options for potentially offering more information to assist in debugging
the situation, rather than needing to be incorporated right now.

Recommendation for Downstream Distributors

By patching site.py[3] to provide their own implementation of the
sys.excepthook function, Python distributors can display tailor-made
error messages for any uncaught exceptions, including informing the user
of a proper, distro-specific way to install missing standard library
modules upon encountering a ModuleNotFoundError.

Some downstream distributors are already using this method of patching
sys.excepthook to integrate with platform crash reporting mechanisms.

Backwards Compatibility

No problems with backwards compatibility are expected. Distributions
that are already patching Python modules to provide custom handling of
missing dependencies can continue to do so unhindered.

Reference and Example Implementation

TBD. The finer details will depend on what's practical given the
capabilities of the CPython build system (other implementations should
then be able to use the generated CPython data, rather than having to
regenerate it themselves).

Notes and References

Ideas leading up to this PEP were discussed on the python-dev mailing
list and subsequently on python-ideas.

Copyright

This document has been placed in the public domain.

[1] http://bazaar.launchpad.net/~doko/python/pkg3.5-debian/view/head:/patches/tkinter-import.diff

[2] http://stackoverflow.com/questions/6463918/how-can-i-get-a-list-of-all-the-python-standard-library-modules

[3] Or sitecustomize.py for organizations with their own custom Python
variant.