PEP: 428 Title: The pathlib module -- object-oriented filesystem paths
Version: $Revision$ Last-Modified: $Date$ Author: Antoine Pitrou
<solipsis@pitrou.net> Status: Final Type: Standards Track Content-Type:
text/x-rst Created: 30-Jul-2012 Python-Version: 3.4 Post-History:
05-Oct-2012 Resolution:
https://mail.python.org/pipermail/python-dev/2013-November/130424.html

Abstract

This PEP proposes the inclusion of a third-party module, pathlib, in the
standard library. The inclusion is proposed under the provisional label,
as described in PEP 411. Therefore, API changes can be done, either as
part of the PEP process, or after acceptance in the standard library
(and until the provisional label is removed).

The aim of this library is to provide a simple hierarchy of classes to
handle filesystem paths and the common operations users do over them.

Related work

An object-oriented API for filesystem paths has already been proposed
and rejected in PEP 355. Several third-party implementations of the idea
of object-oriented filesystem paths exist in the wild:

-   The historical path.py module by Jason Orendorff, Jason R. Coombs
    and others, which provides a str-subclassing Path class;
-   Twisted's slightly specialized FilePath class;
-   An AlternativePathClass proposal, subclassing tuple rather than str;
-   Unipath, a variation on the str-subclassing approach with two public
    classes, an AbstractPath class for operations which don't do I/O and
    a Path class for all common operations.

This proposal attempts to learn from these previous attempts and the
rejection of PEP 355.

Implementation

The implementation of this proposal is tracked in the pep428 branch of
pathlib's Mercurial repository.

Why an object-oriented API

The rationale to represent filesystem paths using dedicated classes is
the same as for other kinds of stateless objects, such as dates, times
or IP addresses. Python has been slowly moving away from strictly
replicating the C language's APIs to providing better, more helpful
abstractions around all kinds of common functionality. Even if this PEP
isn't accepted, it is likely that another form of filesystem handling
abstraction will be adopted one day into the standard library.

Indeed, many people will prefer handling dates and times using the
high-level objects provided by the datetime module, rather than using
numeric timestamps and the time module API. Moreover, using a dedicated
class allows to enable desirable behaviours by default, for example the
case insensitivity of Windows paths.

Proposal

Class hierarchy

The pathlib module implements a simple hierarchy of classes:

    +----------+
    |          |
    ---------| PurePath |--------
    |        |          |       |
    |        +----------+       |
    |             |             |
    |             |             |
    v             |             v
    +---------------+    |    +-----------------+
    |               |    |    |                 |
    | PurePosixPath |    |    | PureWindowsPath |
    |               |    |    |                 |
    +---------------+    |    +-----------------+
    |             v             |
    |          +------+         |
    |          |      |         |
    |   -------| Path |------   |
    |   |      |      |     |   |
    |   |      +------+     |   |
    |   |                   |   |
    |   |                   |   |
    v   v                   v   v
    +-----------+           +-------------+
    |           |           |             |
    | PosixPath |           | WindowsPath |
    |           |           |             |
    +-----------+           +-------------+

This hierarchy divides path classes along two dimensions:

-   a path class can be either pure or concrete: pure classes support
    only operations that don't need to do any actual I/O, which are most
    path manipulation operations; concrete classes support all the
    operations of pure classes, plus operations that do I/O.
-   a path class is of a given flavour according to the kind of
    operating system paths it represents. pathlib implements two
    flavours: Windows paths for the filesystem semantics embodied in
    Windows systems, POSIX paths for other systems.

Any pure class can be instantiated on any system: for example, you can
manipulate PurePosixPath objects under Windows, PureWindowsPath objects
under Unix, and so on. However, concrete classes can only be
instantiated on a matching system: indeed, it would be error-prone to
start doing I/O with WindowsPath objects under Unix, or vice-versa.

Furthermore, there are two base classes which also act as
system-dependent factories: PurePath will instantiate either a
PurePosixPath or a PureWindowsPath depending on the operating system.
Similarly, Path will instantiate either a PosixPath or a WindowsPath.

It is expected that, in most uses, using the Path class is adequate,
which is why it has the shortest name of all.

No confusion with builtins

In this proposal, the path classes do not derive from a builtin type.
This contrasts with some other Path class proposals which were derived
from str. They also do not pretend to implement the sequence protocol:
if you want a path to act as a sequence, you have to lookup a dedicated
attribute (the parts attribute).

The key reasoning behind not inheriting from str is to prevent
accidentally performing operations with a string representing a path and
a string that doesn't, e.g. path + an_accident. Since operations with a
string will not necessarily lead to a valid or expected file system
path, "explicit is better than implicit" by avoiding accidental
operations with strings by not subclassing it. A blog post by a Python
core developer goes into more detail on the reasons behind this specific
design decision.

Immutability

Path objects are immutable, which makes them hashable and also prevents
a class of programming errors.

Sane behaviour

Little of the functionality from os.path is reused. Many os.path
functions are tied by backwards compatibility to confusing or plain
wrong behaviour (for example, the fact that os.path.abspath() simplifies
".." path components without resolving symlinks first).

Comparisons

Paths of the same flavour are comparable and orderable, whether pure or
not:

    >>> PurePosixPath('a') == PurePosixPath('b')
    False
    >>> PurePosixPath('a') < PurePosixPath('b')
    True
    >>> PurePosixPath('a') == PosixPath('a')
    True

Comparing and ordering Windows path objects is case-insensitive:

    >>> PureWindowsPath('a') == PureWindowsPath('A')
    True

Paths of different flavours always compare unequal, and cannot be
ordered:

    >>> PurePosixPath('a') == PureWindowsPath('a')
    False
    >>> PurePosixPath('a') < PureWindowsPath('a')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unorderable types: PurePosixPath() < PureWindowsPath()

Paths compare unequal to, and are not orderable with instances of
builtin types (such as str) and any other types.

Useful notations

The API tries to provide useful notations all the while avoiding magic.
Some examples:

    >>> p = Path('/home/antoine/pathlib/setup.py')
    >>> p.name
    'setup.py'
    >>> p.suffix
    '.py'
    >>> p.root
    '/'
    >>> p.parts
    ('/', 'home', 'antoine', 'pathlib', 'setup.py')
    >>> p.relative_to('/home/antoine')
    PosixPath('pathlib/setup.py')
    >>> p.exists()
    True

Pure paths API

The philosophy of the PurePath API is to provide a consistent array of
useful path manipulation operations, without exposing a hodge-podge of
functions like os.path does.

Definitions

First a couple of conventions:

-   All paths can have a drive and a root. For POSIX paths, the drive is
    always empty.
-   A relative path has neither drive nor root.
-   A POSIX path is absolute if it has a root. A Windows path is
    absolute if it has both a drive and a root. A Windows UNC path (e.g.
    \\host\share\myfile.txt) always has a drive and a root (here,
    \\host\share and \, respectively).
-   A path which has either a drive or a root is said to be anchored.
    Its anchor is the concatenation of the drive and root. Under POSIX,
    "anchored" is the same as "absolute".

Construction

We will present construction and joining together since they expose
similar semantics.

The simplest way to construct a path is to pass it its string
representation:

    >>> PurePath('setup.py')
    PurePosixPath('setup.py')

Extraneous path separators and "." components are eliminated:

    >>> PurePath('a///b/c/./d/')
    PurePosixPath('a/b/c/d')

If you pass several arguments, they will be automatically joined:

    >>> PurePath('docs', 'Makefile')
    PurePosixPath('docs/Makefile')

Joining semantics are similar to os.path.join, in that anchored paths
ignore the information from the previously joined components:

    >>> PurePath('/etc', '/usr', 'bin')
    PurePosixPath('/usr/bin')

However, with Windows paths, the drive is retained as necessary:

    >>> PureWindowsPath('c:/foo', '/Windows')
    PureWindowsPath('c:/Windows')
    >>> PureWindowsPath('c:/foo', 'd:')
    PureWindowsPath('d:')

Also, path separators are normalized to the platform default:

    >>> PureWindowsPath('a/b') == PureWindowsPath('a\\b')
    True

Extraneous path separators and "." components are eliminated, but not
".." components:

    >>> PurePosixPath('a//b/./c/')
    PurePosixPath('a/b/c')
    >>> PurePosixPath('a/../b')
    PurePosixPath('a/../b')

Multiple leading slashes are treated differently depending on the path
flavour. They are always retained on Windows paths (because of the UNC
notation):

    >>> PureWindowsPath('//some/path')
    PureWindowsPath('//some/path/')

On POSIX, they are collapsed except if there are exactly two leading
slashes, which is a special case in the POSIX specification on pathname
resolution (this is also necessary for Cygwin compatibility):

    >>> PurePosixPath('///some/path')
    PurePosixPath('/some/path')
    >>> PurePosixPath('//some/path')
    PurePosixPath('//some/path')

Calling the constructor without any argument creates a path object
pointing to the logical "current directory" (without looking up its
absolute path, which is the job of the cwd() classmethod on concrete
paths):

    >>> PurePosixPath()
    PurePosixPath('.')

Representing

To represent a path (e.g. to pass it to third-party libraries), just
call str() on it:

    >>> p = PurePath('/home/antoine/pathlib/setup.py')
    >>> str(p)
    '/home/antoine/pathlib/setup.py'
    >>> p = PureWindowsPath('c:/windows')
    >>> str(p)
    'c:\\windows'

To force the string representation with forward slashes, use the
as_posix() method:

    >>> p.as_posix()
    'c:/windows'

To get the bytes representation (which might be useful under Unix
systems), call bytes() on it, which internally uses os.fsencode():

    >>> bytes(p)
    b'/home/antoine/pathlib/setup.py'

To represent the path as a file: URI, call the as_uri() method:

    >>> p = PurePosixPath('/etc/passwd')
    >>> p.as_uri()
    'file:///etc/passwd'
    >>> p = PureWindowsPath('c:/Windows')
    >>> p.as_uri()
    'file:///c:/Windows'

The repr() of a path always uses forward slashes, even under Windows,
for readability and to remind users that forward slashes are ok:

    >>> p = PureWindowsPath('c:/Windows')
    >>> p
    PureWindowsPath('c:/Windows')

Properties

Several simple properties are provided on every path (each can be
empty):

    >>> p = PureWindowsPath('c:/Downloads/pathlib.tar.gz')
    >>> p.drive
    'c:'
    >>> p.root
    '\\'
    >>> p.anchor
    'c:\\'
    >>> p.name
    'pathlib.tar.gz'
    >>> p.stem
    'pathlib.tar'
    >>> p.suffix
    '.gz'
    >>> p.suffixes
    ['.tar', '.gz']

Deriving new paths

Joining

A path can be joined with another using the / operator:

    >>> p = PurePosixPath('foo')
    >>> p / 'bar'
    PurePosixPath('foo/bar')
    >>> p / PurePosixPath('bar')
    PurePosixPath('foo/bar')
    >>> 'bar' / p
    PurePosixPath('bar/foo')

As with the constructor, multiple path components can be specified,
either collapsed or separately:

    >>> p / 'bar/xyzzy'
    PurePosixPath('foo/bar/xyzzy')
    >>> p / 'bar' / 'xyzzy'
    PurePosixPath('foo/bar/xyzzy')

A joinpath() method is also provided, with the same behaviour:

    >>> p.joinpath('Python')
    PurePosixPath('foo/Python')

Changing the path's final component

The with_name() method returns a new path, with the name changed:

    >>> p = PureWindowsPath('c:/Downloads/pathlib.tar.gz')
    >>> p.with_name('setup.py')
    PureWindowsPath('c:/Downloads/setup.py')

It fails with a ValueError if the path doesn't have an actual name:

    >>> p = PureWindowsPath('c:/')
    >>> p.with_name('setup.py')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "pathlib.py", line 875, in with_name
        raise ValueError("%r has an empty name" % (self,))
    ValueError: PureWindowsPath('c:/') has an empty name
    >>> p.name
    ''

The with_suffix() method returns a new path with the suffix changed.
However, if the path has no suffix, the new suffix is added:

    >>> p = PureWindowsPath('c:/Downloads/pathlib.tar.gz')
    >>> p.with_suffix('.bz2')
    PureWindowsPath('c:/Downloads/pathlib.tar.bz2')
    >>> p = PureWindowsPath('README')
    >>> p.with_suffix('.bz2')
    PureWindowsPath('README.bz2')

Making the path relative

The relative_to() method computes the relative difference of a path to
another:

    >>> PurePosixPath('/usr/bin/python').relative_to('/usr')
    PurePosixPath('bin/python')

ValueError is raised if the method cannot return a meaningful value:

    >>> PurePosixPath('/usr/bin/python').relative_to('/etc')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "pathlib.py", line 926, in relative_to
        .format(str(self), str(formatted)))
    ValueError: '/usr/bin/python' does not start with '/etc'

Sequence-like access

The parts property returns a tuple providing read-only sequence access
to a path's components:

    >>> p = PurePosixPath('/etc/init.d')
    >>> p.parts
    ('/', 'etc', 'init.d')

Windows paths handle the drive and the root as a single path component:

    >>> p = PureWindowsPath('c:/setup.py')
    >>> p.parts
    ('c:\\', 'setup.py')

(separating them would be wrong, since C: is not the parent of C:\\).

The parent property returns the logical parent of the path:

    >>> p = PureWindowsPath('c:/python33/bin/python.exe')
    >>> p.parent
    PureWindowsPath('c:/python33/bin')

The parents property returns an immutable sequence of the path's logical
ancestors:

    >>> p = PureWindowsPath('c:/python33/bin/python.exe')
    >>> len(p.parents)
    3
    >>> p.parents[0]
    PureWindowsPath('c:/python33/bin')
    >>> p.parents[1]
    PureWindowsPath('c:/python33')
    >>> p.parents[2]
    PureWindowsPath('c:/')

Querying

is_relative() returns True if the path is relative (see definition
above), False otherwise.

is_reserved() returns True if a Windows path is a reserved path such as
CON or NUL. It always returns False for POSIX paths.

match() matches the path against a glob pattern. It operates on
individual parts and matches from the right:

  >>> p = PurePosixPath('/usr/bin') >>> p.match('/usr/b*') True >>>
  p.match('usr/b*') True >>> p.match('b*') True >>> p.match('/u*') False

This behaviour respects the following expectations:

-   A simple pattern such as "*.py" matches arbitrarily long paths as
    long as the last part matches, e.g. "/usr/foo/bar.py".
-   Longer patterns can be used as well for more complex matching, e.g.
    "/usr/foo/*.py" matches "/usr/foo/bar.py".

Concrete paths API

In addition to the operations of the pure API, concrete paths provide
additional methods which actually access the filesystem to query or
mutate information.

Constructing

The classmethod cwd() creates a path object pointing to the current
working directory in absolute form:

    >>> Path.cwd()
    PosixPath('/home/antoine/pathlib')

File metadata

The stat() returns the file's stat() result; similarly, lstat() returns
the file's lstat() result (which is different iff the file is a symbolic
link):

    >>> p.stat()
    posix.stat_result(st_mode=33277, st_ino=7483155, st_dev=2053, st_nlink=1, st_uid=500, st_gid=500, st_size=928, st_atime=1343597970, st_mtime=1328287308, st_ctime=1343597964)

Higher-level methods help examine the kind of the file:

    >>> p.exists()
    True
    >>> p.is_file()
    True
    >>> p.is_dir()
    False
    >>> p.is_symlink()
    False
    >>> p.is_socket()
    False
    >>> p.is_fifo()
    False
    >>> p.is_block_device()
    False
    >>> p.is_char_device()
    False

The file owner and group names (rather than numeric ids) are queried
through corresponding methods:

    >>> p = Path('/etc/shadow')
    >>> p.owner()
    'root'
    >>> p.group()
    'shadow'

Path resolution

The resolve() method makes a path absolute, resolving any symlink on the
way (like the POSIX realpath() call). It is the only operation which
will remove ".." path components. On Windows, this method will also take
care to return the canonical path (with the right casing).

Directory walking

Simple (non-recursive) directory access is done by calling the iterdir()
method, which returns an iterator over the child paths:

    >>> p = Path('docs')
    >>> for child in p.iterdir(): child
    ...
    PosixPath('docs/conf.py')
    PosixPath('docs/_templates')
    PosixPath('docs/make.bat')
    PosixPath('docs/index.rst')
    PosixPath('docs/_build')
    PosixPath('docs/_static')
    PosixPath('docs/Makefile')

This allows simple filtering through list comprehensions:

    >>> p = Path('.')
    >>> [child for child in p.iterdir() if child.is_dir()]
    [PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')]

Simple and recursive globbing is also provided:

    >>> for child in p.glob('**/*.py'): child
    ...
    PosixPath('test_pathlib.py')
    PosixPath('setup.py')
    PosixPath('pathlib.py')
    PosixPath('docs/conf.py')
    PosixPath('build/lib/pathlib.py')

File opening

The open() method provides a file opening API similar to the builtin
open() method:

    >>> p = Path('setup.py')
    >>> with p.open() as f: f.readline()
    ...
    '#!/usr/bin/env python3\n'

Filesystem modification

Several common filesystem operations are provided as methods: touch(),
mkdir(), rename(), replace(), unlink(), rmdir(), chmod(), lchmod(),
symlink_to(). More operations could be provided, for example some of the
functionality of the shutil module.

Detailed documentation of the proposed API can be found at the pathlib
docs.

Discussion

Division operator

The division operator came out first in a poll about the path joining
operator. Initial versions of pathlib used square brackets (i.e.
__getitem__) instead.

joinpath()

The joinpath() method was initially called join(), but several people
objected that it could be confused with str.join() which has different
semantics. Therefore, it was renamed to joinpath().

Case-sensitivity

Windows users consider filesystem paths to be case-insensitive and
expect path objects to observe that characteristic, even though in some
rare situations some foreign filesystem mounts may be case-sensitive
under Windows.

In the words of one commenter,

  "If glob("*.py") failed to find SETUP.PY on Windows, that would be a
  usability disaster".

  -- Paul Moore in
  https://mail.python.org/pipermail/python-dev/2013-April/125254.html

Copyright

This document has been placed into the public domain.