PEP: 3126 Title: Remove Implicit String Concatenation Version:
$Revision$ Last-Modified: $Date$ Author: Jim J. Jewett
<JimJJewett@gmail.com>, Raymond Hettinger <python@rcn.com> Status:
Rejected Type: Standards Track Content-Type: text/x-rst Created:
29-Apr-2007 Post-History: 29-Apr-2007, 30-Apr-2007, 07-May-2007

Rejection Notice

This PEP is rejected. There wasn't enough support in favor, the feature
to be removed isn't all that harmful, and there are some use cases that
would become harder.

Abstract

Python inherited many of its parsing rules from C. While this has been
generally useful, there are some individual rules which are less useful
for python, and should be eliminated.

This PEP proposes to eliminate implicit string concatenation based only
on the adjacency of literals.

Instead of:

    "abc" "def" == "abcdef"

authors will need to be explicit, and either add the strings:

    "abc" + "def" == "abcdef"

or join them:

    "".join(["abc", "def"]) == "abcdef"

Motivation

One goal for Python 3000 should be to simplify the language by removing
unnecessary features. Implicit string concatenation should be dropped in
favor of existing techniques. This will simplify the grammar and
simplify a user's mental picture of Python. The latter is important for
letting the language "fit in your head". A large group of current users
do not even know about implicit concatenation. Of those who do know
about it, a large portion never use it or habitually avoid it. Of those
who both know about it and use it, very few could state with confidence
the implicit operator precedence and under what circumstances it is
computed when the definition is compiled versus when it is run.

History or Future

Many Python parsing rules are intentionally compatible with C. This is a
useful default, but Special Cases need to be justified based on their
utility in Python. We should no longer assume that python programmers
will also be familiar with C, so compatibility between languages should
be treated as a tie-breaker, rather than a justification.

In C, implicit concatenation is the only way to join strings without
using a (run-time) function call to store into a variable. In Python,
the strings can be joined (and still recognized as immutable) using more
standard Python idioms, such + or "".join.

Problem

Implicit String concatenation leads to tuples and lists which are
shorter than they appear; this is turn can lead to confusing, or even
silent, errors. For example, given a function which accepts several
parameters, but offers a default value for some of them:

    def f(fmt, *args):
        print fmt % args

This looks like a valid call, but isn't:

    >>> f("User %s got a message %s",
          "Bob"
          "Time for dinner")

    Traceback (most recent call last):
      File "<pyshell#8>", line 2, in <module>
        "Bob"
      File "<pyshell#3>", line 2, in f
        print fmt % args
    TypeError: not enough arguments for format string

Calls to this function can silently do the wrong thing:

    def g(arg1, arg2=None):
        ...

    # silently transformed into the possibly very different
    # g("arg1 on this linearg2 on this line", None)
    g("arg1 on this line"
      "arg2 on this line")

To quote Jason Orendorff [#Orendorff]

  Oh. I just realized this happens a lot out here. Where I work, we use
  scons, and each SConscript has a long list of filenames:

      sourceFiles = [
          'foo.c'
          'bar.c',
          #...many lines omitted...
          'q1000x.c']

  It's a common mistake to leave off a comma, and then scons complains
  that it can't find 'foo.cbar.c'. This is pretty bewildering behavior
  even if you are a Python programmer, and not everyone here is.

Solution

In Python, strings are objects and they support the __add__ operator, so
it is possible to write:

    "abc" + "def"

Because these are literals, this addition can still be optimized away by
the compiler; the CPython compiler already does so. [1]

Other existing alternatives include multiline (triple-quoted) strings,
and the join method:

    """This string
       extends across
       multiple lines, but you may want to use something like
       Textwrap.dedent
       to clear out the leading spaces
       and/or reformat.
    """


    >>> "".join(["empty", "string", "joiner"]) == "emptystringjoiner"
    True

    >>> " ".join(["space", "string", "joiner"]) == "space string joiner"
    True

    >>> "\n".join(["multiple", "lines"]) == "multiple\nlines" == (
    """multiple
    lines""")
    True

Concerns

Operator Precedence

Guido indicated[2] that this change should be handled by PEP, because
there were a few edge cases with other string operators, such as the %.
(Assuming that str % stays -- it may be eliminated in favor of PEP 3101
-- Advanced String Formatting. [3])

The resolution is to use parentheses to enforce precedence -- the same
solution that can be used today:

    # Clearest, works today, continues to work, optimization is
    # already possible.
    ("abc %s def" + "ghi") % var

    # Already works today; precedence makes the optimization more
    # difficult to recognize, but does not change the semantics.
    "abc" + "def %s ghi" % var

as opposed to:

    # Already fails because modulus (%) is higher precedence than
    # addition (+)
    ("abc %s def" + "ghi" % var)

    # Works today only because adjacency is higher precedence than
    # modulus.  This will no longer be available.
    "abc %s" "def" % var

    # So the 2-to-3 translator can automatically replace it with the
    # (already valid):
    ("abc %s" + "def") % var

Long Commands

  ... build up (what I consider to be) readable SQL queries[4]:

      rows = self.executesql("select cities.city, state, country"
                             "    from cities, venues, events, addresses"
                             "    where cities.city like %s"
                             "      and events.active = 1"
                             "      and venues.address = addresses.id"
                             "      and addresses.city = cities.id"
                             "      and events.venue = venues.id",
                             (city,))

Alternatives again include triple-quoted strings, +, and .join:

    query="""select cities.city, state, country
                 from cities, venues, events, addresses
                 where cities.city like %s
                   and events.active = 1"
                   and venues.address = addresses.id
                   and addresses.city = cities.id
                   and events.venue = venues.id"""

    query=( "select cities.city, state, country"
          + "    from cities, venues, events, addresses"
          + "    where cities.city like %s"
          + "      and events.active = 1"
          + "      and venues.address = addresses.id"
          + "      and addresses.city = cities.id"
          + "      and events.venue = venues.id"
          )

    query="\n".join(["select cities.city, state, country",
                     "    from cities, venues, events, addresses",
                     "    where cities.city like %s",
                     "      and events.active = 1",
                     "      and venues.address = addresses.id",
                     "      and addresses.city = cities.id",
                     "      and events.venue = venues.id"])

    # And yes, you *could* inline any of the above querystrings
    # the same way the original was inlined.
    rows = self.executesql(query, (city,))

Regular Expressions

Complex regular expressions are sometimes stated in terms of several
implicitly concatenated strings with each regex component on a different
line and followed by a comment. The plus operator can be inserted here
but it does make the regex harder to read. One alternative is to use the
re.VERBOSE option. Another alternative is to build-up the regex with a
series of += lines:

    # Existing idiom which relies on implicit concatenation
    r = ('a{20}'  # Twenty A's
         'b{5}'   # Followed by Five B's
         )

    # Mechanical replacement
    r = ('a{20}'  +# Twenty A's
         'b{5}'   # Followed by Five B's
         )

    # already works today
    r = '''a{20}  # Twenty A's
           b{5}   # Followed by Five B's
        '''                 # Compiled with the re.VERBOSE flag

    # already works today
    r = 'a{20}'   # Twenty A's
    r += 'b{5}'   # Followed by Five B's

Internationalization

Some internationalization tools -- notably xgettext -- have already been
special-cased for implicit concatenation, but not for Python's explicit
concatenation.[5]

These tools will fail to extract the (already legal):

    _("some string" +
      " and more of it")

but often have a special case for:

    _("some string"
      " and more of it")

It should also be possible to just use an overly long line (xgettext
limits messages to 2048 characters[6], which is less than Python's
enforced limit) or triple-quoted strings, but these solutions sacrifice
some readability in the code:

    # Lines over a certain length are unpleasant.
    _("some string and more of it")

    # Changing whitespace is not ideal.
    _("""Some string
         and more of it""")
    _("""Some string
    and more of it""")
    _("Some string \
    and more of it")

I do not see a good short-term resolution for this.

Transition

The proposed new constructs are already legal in current Python, and can
be used immediately.

The 2 to 3 translator can be made to mechanically change:

    "str1" "str2"
    ("line1"  #comment
     "line2")

into:

    ("str1" + "str2")
    ("line1"   +#comments
     "line2")

If users want to use one of the other idioms, they can; as these idioms
are all already legal in python 2, the edits can be made to the original
source, rather than patching up the translator.

Open Issues

Is there a better way to support external text extraction tools, or at
least xgettext[7] in particular?

References

Copyright

  This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

[1] Reminder: Py3k PEPs due by April, Hettinger, van Rossum
https://mail.python.org/pipermail/python-3000/2007-April/006563.html

[2] Reminder: Py3k PEPs due by April, Hettinger, van Rossum
https://mail.python.org/pipermail/python-3000/2007-April/006563.html

[3] ps to question Re: Need help completing ABC pep, van Rossum
https://mail.python.org/pipermail/python-3000/2007-April/006737.html

[4] (email Subject) PEP 30XZ: Simplified Parsing, Skip,
https://mail.python.org/pipermail/python-3000/2007-May/007261.html

[5] (email Subject) PEP 30XZ: Simplified Parsing
https://mail.python.org/pipermail/python-3000/2007-May/007305.html

[6] Unix man page for xgettext -- Notes section
http://www.scit.wlv.ac.uk/cgi-bin/mansec?1+xgettext

[7] GNU gettext manual http://www.gnu.org/software/gettext/