PEP: 616 Title: String methods to remove prefixes and suffixes Author:
Dennis Sweeney <sweeney.dennis650@gmail.com> Sponsor: Eric V. Smith
<eric@trueblade.com> Status: Final Type: Standards Track Content-Type:
text/x-rst Created: 19-Mar-2020 Python-Version: 3.9 Post-History:
20-Mar-2020

Abstract

This is a proposal to add two new methods, removeprefix() and
removesuffix(), to the APIs of Python's various string objects. These
methods would remove a prefix or suffix (respectively) from a string, if
present, and would be added to Unicode str objects, binary bytes and
bytearray objects, and collections.UserString.

Rationale

There have been repeated issues on Python-Ideas[1][2],
Python-Dev[3][4][5][6], the Bug Tracker, and StackOverflow[7], related
to user confusion about the existing str.lstrip and str.rstrip methods.
These users are typically expecting the behavior of removeprefix and
removesuffix, but they are surprised that the parameter for lstrip is
interpreted as a set of characters, not a substring. This repeated issue
is evidence that these methods are useful. The new methods allow a
cleaner redirection of users to the desired behavior.

As another testimonial for the usefulness of these methods, several
users on Python-Ideas[8] reported frequently including similar functions
in their code for productivity. The implementation often contained
subtle mistakes regarding the handling of the empty string, so a
well-tested built-in method would be useful.

The existing solutions for creating the desired behavior are to either
implement the methods as in the Specification below, or to use regular
expressions as in the expression re.sub('^' + re.escape(prefix), '', s),
which is less discoverable, requires a module import, and results in
less readable code.

Specification

The builtin str class will gain two new methods which will behave as
follows when type(self) is type(prefix) is type(suffix) is str:

    def removeprefix(self: str, prefix: str, /) -> str:
        if self.startswith(prefix):
            return self[len(prefix):]
        else:
            return self[:]

    def removesuffix(self: str, suffix: str, /) -> str:
        # suffix='' should not call self[:-0].
        if suffix and self.endswith(suffix):
            return self[:-len(suffix)]
        else:
            return self[:]

When the arguments are instances of str subclasses, the methods should
behave as though those arguments were first coerced to base str objects,
and the return value should always be a base str.

Methods with the corresponding semantics will be added to the builtin
bytes and bytearray objects. If b is either a bytes or bytearray object,
then b.removeprefix() and b.removesuffix() will accept any bytes-like
object as an argument. The two methods will also be added to
collections.UserString, with similar behavior.

Motivating examples from the Python standard library

The examples below demonstrate how the proposed methods can make code
one or more of the following:

1.  Less fragile:

    The code will not depend on the user to count the length of a
    literal.

2.  More performant:

    The code does not require a call to the Python built-in len function
    nor to the more expensive str.replace() method.

3.  More descriptive:

    The methods give a higher-level API for code readability as opposed
    to the traditional method of string slicing.

find_recursionlimit.py

-   Current:

        if test_func_name.startswith("test_"):
            print(test_func_name[5:])
        else:
            print(test_func_name)

-   Improved:

        print(test_func_name.removeprefix("test_"))

deccheck.py

This is an interesting case because the author chose to use the
str.replace method in a situation where only a prefix was intended to be
removed.

-   Current:

        if funcname.startswith("context."):
            self.funcname = funcname.replace("context.", "")
            self.contextfunc = True
        else:
            self.funcname = funcname
            self.contextfunc = False

-   Improved:

        if funcname.startswith("context."):
            self.funcname = funcname.removeprefix("context.")
            self.contextfunc = True
        else:
            self.funcname = funcname
            self.contextfunc = False

-   Arguably further improved:

        self.contextfunc = funcname.startswith("context.")
        self.funcname = funcname.removeprefix("context.")

cookiejar.py

-   Current:

        def strip_quotes(text):
            if text.startswith('"'):
                text = text[1:]
            if text.endswith('"'):
                text = text[:-1]
            return text

-   Improved:

        def strip_quotes(text):
            return text.removeprefix('"').removesuffix('"')

test_i18n.py

-   Current:

        creationDate = header['POT-Creation-Date']

        # peel off the escaped newline at the end of string
        if creationDate.endswith('\\n'):
            creationDate = creationDate[:-len('\\n')]

-   Improved:

        creationDate = header['POT-Creation-Date'].removesuffix('\\n')

There were many other such examples in the stdlib.

Rejected Ideas

Expand the lstrip and rstrip APIs

Because lstrip takes a string as its argument, it could be viewed as
taking an iterable of length-1 strings. The API could, therefore, be
generalized to accept any iterable of strings, which would be
successively removed as prefixes. While this behavior would be
consistent, it would not be obvious for users to have to call
'foobar'.lstrip(('foo',)) for the common use case of a single prefix.

Remove multiple copies of a prefix

This is the behavior that would be consistent with the aforementioned
expansion of the lstrip/rstrip API -- repeatedly applying the function
until the argument is unchanged. This behavior is attainable from the
proposed behavior via by the following:

    >>> s = 'Foo' * 100 + 'Bar'
    >>> prefix = 'Foo'
    >>> while s.startswith(prefix): s = s.removeprefix(prefix)
    >>> s
    'Bar'

Raising an exception when not found

There was a suggestion that s.removeprefix(pre) should raise an
exception if not s.startswith(pre). However, this does not match with
the behavior and feel of other string methods. There could be
required=False keyword added, but this violates the KISS principle.

Accepting a tuple of affixes

It could be convenient to write the test_concurrent_futures.py example
above as name.removesuffix(('Mixin', 'Tests', 'Test')), so there was a
suggestion that the new methods be able to take a tuple of strings as an
argument, similar to the startswith() API. Within the tuple, only the
first matching affix would be removed. This was rejected on the
following grounds:

-   This behavior can be surprising or visually confusing, especially
    when one prefix is empty or is a substring of another prefix, as in
    'FooBar'.removeprefix(('', 'Foo')) == 'FooBar' or
    'FooBar text'.removeprefix(('Foo', 'FooBar ')) == 'Bar text'.
-   The API for str.replace() only accepts a single pair of replacement
    strings, but has stood the test of time by refusing the temptation
    to guess in the face of ambiguous multiple replacements.
-   There may be a compelling use case for such a feature in the future,
    but generalization before the basic feature sees real-world use
    would be easy to get permanently wrong.

Alternative Method Names

Several alternatives method names have been proposed. Some are listed
below, along with commentary for why they should be rejected in favor of
removeprefix (the same arguments hold for removesuffix).

-   ltrim, trimprefix, etc.:

    "Trim" does in other languages (e.g. JavaScript, Java, Go, PHP) what
    strip methods do in Python.

-   lstrip(string=...)

    This would avoid adding a new method, but for different behavior,
    it's better to have two different methods than one method with a
    keyword argument that selects the behavior.

-   remove_prefix:

    All of the other methods of the string API, e.g. str.startswith(),
    use lowercase rather than lower_case_with_underscores.

-   removeleft, leftremove, or lremove:

    The explicitness of "prefix" is preferred.

-   cutprefix, deleteprefix, withoutprefix, dropprefix, etc.:

    Many of these might have been acceptable, but "remove" is
    unambiguous and matches how one would describe the "remove the
    prefix" behavior in English.

-   stripprefix:

    Users may benefit from remembering that "strip" means working with
    sets of characters, while other methods work with substrings, so
    re-using "strip" here should be avoided.

How to Teach This

Among the uses for the partition(), startswith(), and split() string
methods or the enumerate() or zip() built-in functions, a common theme
is that if a beginner finds themselves manually indexing or slicing a
string, then they should consider whether there is a higher-level method
that better communicates what the code should do rather than merely how
the code should do it. The proposed removeprefix() and removesuffix()
methods expand the high-level string "toolbox" and further allow for
this sort of skepticism toward manual slicing.

The main opportunity for user confusion will be the conflation of
lstrip/rstrip with removeprefix/removesuffix. It may therefore be
helpful to emphasize (as the documentation will) the following
differences between the methods:

-   (l/r)strip:
    -   The argument is interpreted as a character set.
    -   The characters are repeatedly removed from the appropriate end
        of the string.
-   remove(prefix/suffix):
    -   The argument is interpreted as an unbroken substring.
    -   Only at most one copy of the prefix/suffix is removed.

Reference Implementation

See the pull request on GitHub[9].

History of Major revisions

-   Version 3: Remove tuple behavior.
-   Version 2: Changed name to removeprefix/removesuffix; added support
    for tuples as arguments
-   Version 1: Initial draft with cutprefix/cutsuffix

References

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

[1] [Python-Ideas] "New explicit methods to trim strings"
(https://mail.python.org/archives/list/python-ideas@python.org/thread/RJARZSUKCXRJIP42Z2YBBAEN5XA7KEC3/)

[2] "Re: [Python-ideas] adding a trim convenience function"
(https://mail.python.org/archives/list/python-ideas@python.org/thread/SJ7CKPZSKB5RWT7H3YNXOJUQ7QLD2R3X/#C2W5T7RCFSHU5XI72HG53A6R3J3SN4MV)

[3] "Re: [Python-Dev] strip behavior provides inconsistent results with
certain strings"
(https://mail.python.org/archives/list/python-ideas@python.org/thread/XYFQMFPUV6FR2N5BGYWPBVMZ5BE5PJ6C/#XYFQMFPUV6FR2N5BGYWPBVMZ5BE5PJ6C)

[4] [Python-Dev] "correction of a bug"
(https://mail.python.org/archives/list/python-dev@python.org/thread/AOZ7RFQTQLCZCTVNKESZI67PB3PSS72X/#AOZ7RFQTQLCZCTVNKESZI67PB3PSS72X)

[5] [Python-Dev] "str.lstrip bug?"
(https://mail.python.org/archives/list/python-dev@python.org/thread/OJDKRIESKGTQFNLX6KZSGKU57UXNZYAN/#CYZUFFJ2Q5ZZKMJIQBZVZR4NSLK5ZPIH)

[6] [Python-Dev] "strip behavior provides inconsistent results with
certain strings"
(https://mail.python.org/archives/list/python-dev@python.org/thread/ZWRGCGANHGVDPP44VQKRIYOYX7LNVDVG/#ZWRGCGANHGVDPP44VQKRIYOYX7LNVDVG)

[7] Comment listing Bug Tracker and StackOverflow issues
(https://mail.python.org/archives/list/python-ideas@python.org/message/GRGAFIII3AX22K3N3KT7RB4DPBY3LPVG/)

[8] [Python-Ideas] "New explicit methods to trim strings"
(https://mail.python.org/archives/list/python-ideas@python.org/thread/RJARZSUKCXRJIP42Z2YBBAEN5XA7KEC3/)

[9] GitHub pull request with implementation
(https://github.com/python/cpython/pull/18939)