PEP: 582 Title: Python local packages directory Version: $Revision$
Last-Modified: $Date$ Author: Kushal Das <mail@kushaldas.in>, Steve
Dower <steve.dower@python.org>, Donald Stufft <donald@stufft.io>, Alyssa
Coghlan <ncoghlan@gmail.com> Discussions-To:
https://discuss.python.org/t/pep-582-python-local-packages-directory/963/
Status: Rejected Type: Standards Track Topic: Packaging Content-Type:
text/x-rst Created: 16-May-2018 Python-Version: 3.12 Post-History:
01-Mar-2019, Resolution:
https://discuss.python.org/t/pep-582-python-local-packages-directory/963/430

Abstract

This PEP proposes extending the existing mechanism for setting up
sys.path to include a new __pypackages__ directory, in addition to the
existing locations. The new directory will be added at the start of
sys.path, after the current working directory and just before the system
site-packages, to give packages installed there priority over other
locations.

This is similar to the existing mechanism of adding the current
directory (or the directory the script is located in), but by using a
subdirectory, additional libraries are kept separate from the user's
work.

Motivation

New Python programmers can benefit from being taught the value of
isolating an individual project's dependencies from their system
environment. However, the existing mechanism for doing this, virtual
environments, is known to be complex and error-prone for beginners to
understand. Explaining virtual environments is often a distraction when
trying to get a group of beginners set up - differences in platform and
shell environments require individual assistance, and the need for
activation in every new shell session makes it easy for students to make
mistakes when coming back to work after a break. This proposal offers a
lightweight solution that gives isolation without the user needing to
understand more advanced concepts.

Furthermore, standalone Python applications usually need 3rd party
libraries to function. Typically, they are either designed to be run
from a virtual environment, where the dependencies are installed into
the environment alongside the application, or they bundle their
dependencies in a subdirectory, and modify sys.path at application
startup. Virtual environments, while a common and effective solution
(used, for example, by the pipx tool), are somewhat awkward to set up
and manage, and are not relocatable. On the other hand, manual
manipulation of sys.path is boilerplate that developers need to get
right, and (being a runtime behaviour) it is not understood by tools
like linters and type checkers. The __pypackages__ proposal formalises
the idea of a "bundled dependencies" location, avoiding the boilerplate
and providing a standard location that development tools can be taught
to recognise.

It should be noted that in general, Python libraries cannot be simply
copied between machines, platforms, or even necessarily between Python
versions. This proposal does nothing to change that fact, and while it
is tempting to assume that bundling a script and its __pypackages__ is a
mechanism for distributing applications, this is explicitly not a goal
of this proposal. Developers remain responsible for the portability of
their code.

Rationale

While sys.path can be manipulated at runtime, the default value is
important, as it establishes a common baseline that users and tools can
agree on. The current default does not include a location that could be
viewed as "private to the current project", and yet that is a useful
concept.

This is similar to the npm node_modules directory, which is popular in
the Javascript community, and something that developers familiar with
that ecosystem often ask for from Python.

Specification

This PEP proposes to add a new step in the process of calculating
sys.path at startup.

When the interactive interpreter starts, if a __pypackages__ directory
is found in the current working directory, then it will be included in
sys.path after the entry for current working directory and just before
the system site-packages.

When the interpreter runs a script, Python will try to find
__pypackages__ in the same directory as the script. If found (along with
the current Python version directory inside), then it will be used,
otherwise Python will behave as it does currently.

The behaviour should work exactly the same as the way the existing
mechanism for adding the current working directory or script directory
to sys.path works. For example, __pypackages__ will be ignored if the -P
option or the PYTHONSAFEPATH environment variable is set.

In order to be recognised, the __pypackages__ directory must be laid out
according to a new localpackages scheme in the sysconfig module.
Specifically, both of the purelib and platlib directories must be
present, using the following code to determine the locations of those
directories:

    scheme = "localpackages"
    purelib = sysconfig.get_path("purelib", scheme, vars={"base": "__pypackages__", "platbase": "__pypackages__"})
    platlib = sysconfig.get_path("platlib", scheme, vars={"base": "__pypackages__", "platbase": "__pypackages__"})

These two locations will be added to sys.path, other directories or
files in the __pypackages__ directory will be silently ignored. The
paths will be based on Python versions.

Note

There is a possible option of having a separate new API, it is
documented at issue #3013.

Example

The following shows an example project directory structure, and
different ways the Python executable and any script will behave. The
example is for Unix-like systems - on Windows the subdirectories will be
different.

    foo
        __pypackages__
            lib
                python3.10
                           site-packages
                                         bottle
        myscript.py

    /> python foo/myscript.py
    sys.path[0] == 'foo'
    sys.path[1] == 'foo/__pypackages__/lib/python3.10/site-packages/'


    cd foo

    foo> /usr/bin/ansible
        #! /usr/bin/env python3
    foo> python /usr/bin/ansible

    foo> python myscript.py

    foo> python
    sys.path[0] == '.'
    sys.path[1] == './__pypackages__/lib/python3.10/site-packages'

    foo> python -m bottle

We have a project directory called foo and it has a __pypackages__
inside of it. We have bottle installed in that
__pypackages__/lib/python3.10/site-packages/, and have a myscript.py
file inside of the project directory. We have used whatever tool we
generally use to install bottle in that location.

For invoking a script, Python will try to find a __pypackages__ inside
of the directory that the script resides[1], /usr/bin. The same will
happen in case of the last example, where we are executing
/usr/bin/ansible from inside of the foo directory. In both cases, it
will not use the __pypackages__ in the current working directory.

Similarly, if we invoke myscript.py from the first example, it will use
the __pypackages__ directory that was in the foo directory.

If we go inside of the foo directory and start the Python executable
(the interpreter), it will find the __pypackages__ directory inside of
the current working directory and use it in the sys.path. The same
happens if we try to use the -m and use a module. In our example, bottle
module will be found inside of the __pypackages__ directory.

The above two examples are only cases where __pypackages__ from current
working directory is used.

In another example scenario, a trainer of a Python class can say "Today
we are going to learn how to use Twisted! To start, please checkout our
example project, go to that directory, and then run a given command to
install Twisted."

That will install Twisted into a directory separate from python3.
There's no need to discuss virtual environments, global versus user
installs, etc. as the install will be local by default. The trainer can
then just keep telling them to use python3 without any activation step,
etc.

Relationship to virtual environments

At its heart, this proposal is simply to modify the calculation of the
default value of sys.path, and does not relate at all to the virtual
environment mechanism. However, __pypackages__ can be viewed as
providing an isolation capability, and in that sense, it "competes" with
virtual environments.

However, there are significant differences:

  -   Virtual environments are isolated from the system environment,
      whereas __pypackages__ simply adds to the system environment.
  -   Virtual environments include a full "installation scheme", with
      directories for binaries, C header files, etc., whereas
      __pypackages__ is solely for Python library code.
  -   Virtual environments work most smoothly when "activated". This
      proposal needs no activation.

This proposal should be seen as independent of virtual environments, not
competing with them. At best, some use cases currently only served by
virtual environments can also be served (possibly better) by
__pypackages__.

It should be noted that libraries installed in __pypackages__ will be
visible in a virtual environment. This arguably breaks the isolation of
virtual environments, but it is no different in principle to the
presence of the current directory on sys.path (or mechanisms like the
PYTHONPATH environment variable). The only difference is in degree, as
the expectation is that people will more commonly install packages in
__pypackages__. The alternative would be to explicitly detect virtual
environments and disable __pypackages__ in that case - however that
would break scripts with bundled dependencies. The PEP authors believe
that developers using virtual environments should be experienced enough
to understand the issue and anticipate and avoid any problems.

Security Considerations

In theory, it is possible to add a library to the __pypackages__
directory that overrides a stdlib module or an installed 3rd party
library. For the __pypackages__ associated with a script, this is
assumed not to be a significant issue, as it is unlikely that anyone
would be able to write to __pypackages__ unless they also had the
ability to write to the script itself.

For a __pypackages__ directory in the current working directory, the
interactive interpreter could be affected. However, this is not
significantly different than the existing issue of someone having a
math.py module in their current directory, and while (just like that
case) it can cause user confusion, it does not introduce any new
security implications.

When running a script, any __pypackages__ directory in the current
working directory is ignored. This is the same approach Python uses for
adding the current working directory to sys.path and ensures that it is
not possible to change the behaviour of a script by modifying files in
the current directory.

Also, a __pypackages__ directory is only recognised in the current (or
script) directory. The interpreter will not scan for __pypackages__ in
parent directories. Doing so would open up the risk of security issues
if directory permissions on parents differ. In particular, scripts in
the bin directory or __pypackages__ (the scripts location in sysconfig
terms) have no special access to the libraries installed in
__pypackages__. Putting executable scripts in a bin directory is not
supported by this proposal.

How to Teach This

The original motivation for this proposal was to make it easier to teach
Python to beginners. To that end, it needs to be easy to explain, and
simple to use.

At the most basic level, this is similar to the existing mechanism where
the script directory is added to sys.path and can be taught in a similar
manner. However, for its intended use of "lightweight isolation", it
would likely be taught in terms of "things you put in a __pypackages__
directory are private to your script". The experience of the PEP authors
suggests that this would be significantly easier to teach than the
current alternative of introducing virtual environments.

Impact on Tools

As the intended use of the feature is to install 3rd party libraries in
the new directory, it is important that tools, particularly installers,
understand how to manage __pypackages__.

It is hoped that tools will introduce a dedicated "pypackages"
installation mode that is guaranteed to match the expected layout in all
cases. However, the question of how best to support the __pypackages__
layout is ultimately left to individual tool maintainers to consider and
decide on.

Tools that locate packages without actually running Python code (IDEs,
linters, type checkers, etc.) would need updating to recognise
__pypackages__. In the absence of such updates, the __pypackages__
directory would work similarly to directories currently added to
sys.path at runtime (i.e., the tool would probably ignore it).

Backwards Compatibility

The directory name __pypackages__ was chosen because it is unlikely to
be in common use. It is true that users who have chosen to use that name
for their own purposes will be impacted, but at the time this PEP was
written, this was viewed as a relatively low risk.

Unfortunately, in the time this PEP has been under discussion, a number
of tools have chosen to implement variations on what is being proposed
here, which are not all compatible with the final form of the PEP. As a
result, the risk of clashes is now higher than originally anticipated.

It would be possible to mitigate this by choosing a different name,
hopefully as uncommon as __pypackages__ originally was. But
realistically, any compatibility issues can be viewed as simply the
consequences of people trying to implement draft proposals, without
making the effort to track changes in the proposal. As such, it seems
reasonable to retain the __pypackages__ name, and put the burden of
addressing the compatibility issue on the tools that implemented the
draft version.

Impact on other Python implementations

Other Python implementations will need to replicate the new behavior of
the interpreter bootstrap, including locating the __pypackages__
directory and adding it the sys.path just before site packages, if it is
present. This is no different to any other Python change.

Reference Implementation

Here is a small script which will enable the implementation for Cpython
& in PyPy.

Rejected Ideas

-   Alternative names, such as __pylocal__ and python_modules.
    Ultimately, the name is arbitrary and the chosen name is good
    enough.
-   Additional features of virtual environments. This proposal is not a
    replacement for virtual environments, and such features are
    therefore out of scope.
-   We will not scan any parent directory to find __pypackages__. If we
    want to execute scripts inside of the ~/bin/ directory, then the
    __pypackages__ directory must be inside of the ~/bin/ directory.
    Doing any such scan for __pypackages__ (for the interpreter or a
    script) will have security implications and also increase startup
    time.
-   Raise an error if unexpected files or directories are present in
    __pypackages__. This is considered too strict, particularly as
    transitional approaches like pip install --prefix can create
    additional files in __pypackages__.
-   Using a different sysconfig scheme, or a dedicated pypackages
    scheme. While this is attractive in theory, it makes transition
    harder, as there will be no readily-available way of installing to
    __pypackages__ until tools implement explicit support. And while the
    PEP authors hope and assume that such support would be added, having
    the proposal dependent on such support in order to be usable seems
    like an unacceptable risk.

Copyright

This document has been placed in the public domain.

[1] In the case of symlinks, it is the directory where the actual script
resides, not the symlink pointing to the script