PEP: 519 Title: Adding a file system path protocol Version: $Revision$
Last-Modified: $Date$ Author: Brett Cannon <brett@python.org>, Koos
Zevenhoven <k7hoven@gmail.com> Status: Final Type: Standards Track
Content-Type: text/x-rst Created: 11-May-2016 Python-Version: 3.6
Post-History: 11-May-2016, 12-May-2016, 13-May-2016 Resolution:
https://mail.python.org/pipermail/python-dev/2016-May/144646.html

Abstract

This PEP proposes a protocol for classes which represent a file system
path to be able to provide a str or bytes representation. Changes to
Python's standard library are also proposed to utilize this protocol
where appropriate to facilitate the use of path objects where
historically only str and/or bytes file system paths are accepted. The
goal is to facilitate the migration of users towards rich path objects
while providing an easy way to work with code expecting str or bytes.

Rationale

Historically in Python, file system paths have been represented as
strings or bytes. This choice of representation has stemmed from C's own
decision to represent file system paths as const char *[1]. While that
is a totally serviceable format to use for file system paths, it's not
necessarily optimal. At issue is the fact that while all file system
paths can be represented as strings or bytes, not all strings or bytes
represent a file system path. This can lead to issues where any e.g.
string duck-types to a file system path whether it actually represents a
path or not.

To help elevate the representation of file system paths from their
representation as strings and bytes to a richer object representation,
the pathlib module[2] was provisionally introduced in Python 3.4 through
PEP 428. While considered by some as an improvement over strings and
bytes for file system paths, it has suffered from a lack of adoption.
Typically the key issue listed for the low adoption rate has been the
lack of support in the standard library. This lack of support required
users of pathlib to manually convert path objects to strings by calling
str(path) which many found error-prone.

One issue in converting path objects to strings comes from the fact that
the only generic way to get a string representation of the path was to
pass the object to str(). This can pose a problem when done blindly as
nearly all Python objects have some string representation whether they
are a path or not, e.g. str(None) will give a result that
builtins.open()[3] will happily use to create a new file.

Exacerbating this whole situation is the DirEntry object[4]. While path
objects have a representation that can be extracted using str(),
DirEntry objects expose a path attribute instead. Having no common
interface between path objects, DirEntry, and any other third-party path
library has become an issue. A solution that allows any
path-representing object to declare that it is a path and a way to
extract a low-level representation that all path objects could support
is desired.

This PEP then proposes to introduce a new protocol to be followed by
objects which represent file system paths. Providing a protocol allows
for explicit signaling of what objects represent file system paths as
well as a way to extract a lower-level representation that can be used
with older APIs which only support strings or bytes.

Discussions regarding path objects that led to this PEP can be found in
multiple threads on the python-ideas mailing list archive [5] for the
months of March and April 2016 and on the python-dev mailing list
archives[6] during April 2016.

Proposal

This proposal is split into two parts. One part is the proposal of a
protocol for objects to declare and provide support for exposing a file
system path representation. The other part deals with changes to
Python's standard library to support the new protocol. These changes
will also lead to the pathlib module dropping its provisional status.

Protocol

The following abstract base class defines the protocol for an object to
be considered a path object:

    import abc
    import typing as t


    class PathLike(abc.ABC):

        """Abstract base class for implementing the file system path protocol."""

        @abc.abstractmethod
        def __fspath__(self) -> t.Union[str, bytes]:
            """Return the file system path representation of the object."""
            raise NotImplementedError

Objects representing file system paths will implement the __fspath__()
method which will return the str or bytes representation of the path.
The str representation is the preferred low-level path representation as
it is human-readable and what people historically represent paths as.

Standard library changes

It is expected that most APIs in Python's standard library that
currently accept a file system path will be updated appropriately to
accept path objects (whether that requires code or simply an update to
documentation will vary). The modules mentioned below, though, deserve
specific details as they have either fundamental changes that empower
the ability to use path objects, or entail additions/removal of APIs.

builtins

open()[7] will be updated to accept path objects as well as continue to
accept str and bytes.

os

The fspath() function will be added with the following semantics:

    import typing as t


    def fspath(path: t.Union[PathLike, str, bytes]) -> t.Union[str, bytes]:
        """Return the string representation of the path.

        If str or bytes is passed in, it is returned unchanged. If __fspath__()
        returns something other than str or bytes then TypeError is raised. If
        this function is given something that is not str, bytes, or os.PathLike
        then TypeError is raised.
        """
        if isinstance(path, (str, bytes)):
            return path

        # Work from the object's type to match method resolution of other magic
        # methods.
        path_type = type(path)
        try:
            path = path_type.__fspath__(path)
        except AttributeError:
            if hasattr(path_type, '__fspath__'):
                raise
        else:
            if isinstance(path, (str, bytes)):
                return path
            else:
                raise TypeError("expected __fspath__() to return str or bytes, "
                                "not " + type(path).__name__)

        raise TypeError("expected str, bytes or os.PathLike object, not "
                        + path_type.__name__)

The os.fsencode()[8] and os.fsdecode()[9] functions will be updated to
accept path objects. As both functions coerce their arguments to bytes
and str, respectively, they will be updated to call __fspath__() if
present to convert the path object to a str or bytes representation, and
then perform their appropriate coercion operations as if the return
value from __fspath__() had been the original argument to the coercion
function in question.

The addition of os.fspath(), the updates to os.fsencode()/os.fsdecode(),
and the current semantics of pathlib.PurePath provide the semantics
necessary to get the path representation one prefers. For a path object,
pathlib.PurePath/Path can be used. To obtain the str or bytes
representation without any coercion, then os.fspath() can be used. If a
str is desired and the encoding of bytes should be assumed to be the
default file system encoding, then os.fsdecode() should be used. If a
bytes representation is desired and any strings should be encoded using
the default file system encoding, then os.fsencode() is used. This PEP
recommends using path objects when possible and falling back to string
paths as necessary and using bytes as a last resort.

Another way to view this is as a hierarchy of file system path
representations (highest- to lowest-level): path → str → bytes. The
functions and classes under discussion can all accept objects on the
same level of the hierarchy, but they vary in whether they promote or
demote objects to another level. The pathlib.PurePath class can promote
a str to a path object. The os.fspath() function can demote a path
object to a str or bytes instance, depending on what __fspath__()
returns. The os.fsdecode() function will demote a path object to a
string or promote a bytes object to a str. The os.fsencode() function
will demote a path or string object to bytes. There is no function that
provides a way to demote a path object directly to bytes while bypassing
string demotion.

The DirEntry object[10] will gain an __fspath__() method. It will return
the same value as currently found on the path attribute of DirEntry
instances.

The Protocol ABC will be added to the os module under the name
os.PathLike.

os.path

The various path-manipulation functions of os.path[11] will be updated
to accept path objects. For polymorphic functions that accept both bytes
and strings, they will be updated to simply use os.fspath().

During the discussions leading up to this PEP it was suggested that
os.path not be updated using an "explicit is better than implicit"
argument. The thinking was that since __fspath__() is polymorphic itself
it may be better to have code working with os.path extract the path
representation from path objects explicitly. There is also the
consideration that adding support this deep into the low-level OS APIs
will lead to code magically supporting path objects without requiring
any documentation updated, leading to potential complaints when it
doesn't work, unbeknownst to the project author.

But it is the view of this PEP that "practicality beats purity" in this
instance. To help facilitate the transition to supporting path objects,
it is better to make the transition as easy as possible than to worry
about unexpected/undocumented duck typing support for path objects by
projects.

There has also been the suggestion that os.path functions could be used
in a tight loop and the overhead of checking or calling __fspath__()
would be too costly. In this scenario only path-consuming APIs would be
directly updated and path-manipulating APIs like the ones in os.path
would go unmodified. This would require library authors to update their
code to support path objects if they performed any path manipulations,
but if the library code passed the path straight through then the
library wouldn't need to be updated. It is the view of this PEP and
Guido, though, that this is an unnecessary worry and that performance
will still be acceptable.

pathlib

The constructor for pathlib.PurePath and pathlib.Path will be updated to
accept PathLike objects. Both PurePath and Path will continue to not
accept bytes path representations, and so if __fspath__() returns bytes
it will raise an exception.

The path attribute will be removed as this PEP makes it redundant (it
has not been included in any released version of Python and so is not a
backwards-compatibility concern).

C API

The C API will gain an equivalent function to os.fspath():

    /*
        Return the file system path representation of the object.

        If the object is str or bytes, then allow it to pass through with
        an incremented refcount. If the object defines __fspath__(), then
        return the result of that method. All other types raise a TypeError.
    */
    PyObject *
    PyOS_FSPath(PyObject *path)
    {
        _Py_IDENTIFIER(__fspath__);
        PyObject *func = NULL;
        PyObject *path_repr = NULL;

        if (PyUnicode_Check(path) || PyBytes_Check(path)) {
            Py_INCREF(path);
            return path;
        }

        func = _PyObject_LookupSpecial(path, &PyId___fspath__);
        if (NULL == func) {
            return PyErr_Format(PyExc_TypeError,
                                "expected str, bytes or os.PathLike object, "
                                "not %S",
                                path->ob_type);
        }

        path_repr = PyObject_CallFunctionObjArgs(func, NULL);
        Py_DECREF(func);
        if (!PyUnicode_Check(path_repr) && !PyBytes_Check(path_repr)) {
            Py_DECREF(path_repr);
            return PyErr_Format(PyExc_TypeError,
                                "expected __fspath__() to return str or bytes, "
                                "not %S",
                                path_repr->ob_type);
        }

        return path_repr;
    }

Backwards compatibility

There are no explicit backwards-compatibility concerns. Unless an object
incidentally already defines a __fspath__() method there is no reason to
expect the pre-existing code to break or expect to have its semantics
implicitly changed.

Libraries wishing to support path objects and a version of Python prior
to Python 3.6 and the existence of os.fspath() can use the idiom of
path.__fspath__() if hasattr(path, "__fspath__") else path.

Implementation

This is the task list for what this PEP proposes to be changed in Python
3.6:

1.  Remove the path attribute from pathlib (done)
2.  Remove the provisional status of pathlib (done)
3.  Add os.PathLike (code and docs done)
4.  Add PyOS_FSPath() (code and docs done)
5.  Add os.fspath() (done)
6.  Update os.fsencode() (done)
7.  Update os.fsdecode() (done)
8.  Update pathlib.PurePath and pathlib.Path (done)
    1.  Add __fspath__()
    2.  Add os.PathLike support to the constructors
9.  Add __fspath__() to DirEntry (done)
10. Update builtins.open() (done)
11. Update os.path (done)
12. Add a glossary entry for "path-like" (done)
13. Update "What's New" (done)

Rejected Ideas

Other names for the protocol's method

Various names were proposed during discussions leading to this PEP,
including __path__, __pathname__, and __fspathname__. In the end people
seemed to gravitate towards __fspath__ for being unambiguous without
being unnecessarily long.

Separate str/bytes methods

At one point it was suggested that __fspath__() only return strings and
another method named __fspathb__() be introduced to return bytes. The
thinking is that by making __fspath__() not be polymorphic it could make
dealing with the potential string or bytes representations easier. But
the general consensus was that returning bytes will more than likely be
rare and that the various functions in the os module are the better
abstraction to promote over direct calls to __fspath__().

Providing a path attribute

To help deal with the issue of pathlib.PurePath not inheriting from str,
originally it was proposed to introduce a path attribute to mirror what
os.DirEntry provides. In the end, though, it was determined that a
protocol would provide the same result while not directly exposing an
API that most people will never need to interact with directly.

Have __fspath__() only return strings

Much of the discussion that led to this PEP revolved around whether
__fspath__() should be polymorphic and return bytes as well as str or
only return str. The general sentiment for this view was that bytes are
difficult to work with due to their inherent lack of information about
their encoding and PEP 383 makes it possible to represent all file
system paths using str with the surrogateescape handler. Thus, it would
be better to forcibly promote the use of str as the low-level path
representation for high-level path objects.

In the end, it was decided that using bytes to represent paths is simply
not going to go away and thus they should be supported to some degree.
The hope is that people will gravitate towards path objects like pathlib
and that will move people away from operating directly with bytes.

A generic string encoding mechanism

At one point there was a discussion of developing a generic mechanism to
extract a string representation of an object that had semantic meaning
(__str__() does not necessarily return anything of semantic significance
beyond what may be helpful for debugging). In the end, it was deemed to
lack a motivating need beyond the one this PEP is trying to solve in a
specific fashion.

Have __fspath__ be an attribute

It was briefly considered to have __fspath__ be an attribute instead of
a method. This was rejected for two reasons. One, historically protocols
have been implemented as "magic methods" and not "magic methods and
attributes". Two, there is no guarantee that the lower-level
representation of a path object will be pre-computed, potentially
misleading users that there was no expensive computation behind the
scenes in case the attribute was implemented as a property.

This also indirectly ties into the idea of introducing a path attribute
to accomplish the same thing. This idea has an added issue, though, of
accidentally having any object with a path attribute meet the protocol's
duck typing. Introducing a new magic method for the protocol helpfully
avoids any accidental opting into the protocol.

Provide specific type hinting support

There was some consideration to providing a generic typing.PathLike
class which would allow for e.g. typing.PathLike[str] to specify a type
hint for a path object which returned a string representation. While
potentially beneficial, the usefulness was deemed too small to bother
adding the type hint class.

This also removed any desire to have a class in the typing module which
represented the union of all acceptable path-representing types as that
can be represented with typing.Union[str, bytes, os.PathLike] easily
enough and the hope is users will slowly gravitate to path objects only.

Provide os.fspathb()

It was suggested that to mirror the structure of e.g.
os.getcwd()/os.getcwdb(), that os.fspath() only return str and that
another function named os.fspathb() be introduced that only returned
bytes. This was rejected as the purposes of the *b() functions are tied
to querying the file system where there is a need to get the raw bytes
back. As this PEP does not work directly with data on a file system (but
which may be), the view was taken this distinction is unnecessary. It's
also believed that the need for only bytes will not be common enough to
need to support in such a specific manner as os.fsencode() will provide
similar functionality.

Call __fspath__() off of the instance

An earlier draft of this PEP had os.fspath() calling path.__fspath__()
instead of type(path).__fspath__(path). The changed to be consistent
with how other magic methods in Python are resolved.

Acknowledgements

Thanks to everyone who participated in the various discussions related
to this PEP that spanned both python-ideas and python-dev. Special
thanks to Stephen Turnbull for direct feedback on early drafts of this
PEP. More special thanks to Koos Zevenhoven and Ethan Furman for not
only feedback on early drafts of this PEP but also helping to drive the
overall discussion on this topic across the two mailing lists.

References

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

[1] open() documentation for the C standard library
(http://www.gnu.org/software/libc/manual/html_node/Opening-and-Closing-Files.html)

[2] The pathlib module
(https://docs.python.org/3/library/pathlib.html#module-pathlib)

[3] The builtins.open() function
(https://docs.python.org/3/library/functions.html#open)

[4] The os.DirEntry class
(https://docs.python.org/3/library/os.html#os.DirEntry)

[5] The python-ideas mailing list archive
(https://mail.python.org/pipermail/python-ideas/)

[6] The python-dev mailing list archive
(https://mail.python.org/pipermail/python-dev/)

[7] The builtins.open() function
(https://docs.python.org/3/library/functions.html#open)

[8] The os.fsencode() function
(https://docs.python.org/3/library/os.html#os.fsencode)

[9] The os.fsdecode() function
(https://docs.python.org/3/library/os.html#os.fsdecode)

[10] The os.DirEntry class
(https://docs.python.org/3/library/os.html#os.DirEntry)

[11] The os.path module
(https://docs.python.org/3/library/os.path.html#module-os.path)