PEP: 3140 Title: str(container) should call str(item), not repr(item)
Version: $Revision$ Last-Modified: $Date$ Author: Oleg Broytman
<phd@phdru.name>, Jim J. Jewett <jimjjewett@gmail.com> Discussions-To:
python-3000@python.org Status: Rejected Type: Standards Track
Content-Type: text/x-rst Created: 27-May-2008 Post-History: 28-May-2008

Rejection

Guido said this would cause too much disturbance too close to beta.
See[1].

Abstract

This document discusses the advantages and disadvantages of the current
implementation of str(container). It also discusses the pros and cons of
a different approach - to call str(item) instead of repr(item).

Motivation

Currently str(container) calls repr on items. Arguments for it:

-   containers refuse to guess what the user wants to see on
    str(container) - surroundings, delimiters, and so on;
-   repr(item) usually displays type information - apostrophes around
    strings, class names, etc.

Arguments against:

-   it's illogical; str() is expected to call __str__ if it exists, not
    __repr__;
-   there is no standard way to print a container's content calling
    items' __str__, that's inconvenient in cases where __str__ and
    __repr__ return different results;
-   repr(item) sometimes do wrong things (hex-escapes non-ASCII strings,
    e.g.)

This PEP proposes to change how str(container) works. It is proposed to
mimic how repr(container) works except one detail - call str on items
instead of repr. This allows a user to choose what results she want to
get - from item.__repr__ or item.__str__.

Current situation

Most container types (tuples, lists, dicts, sets, etc.) do not implement
__str__ method, so str(container) calls container.__repr__, and
container.__repr__, once called, forgets it is called from str and
always calls repr on the container's items.

This behaviour has advantages and disadvantages. One advantage is that
most items are represented with type information - strings are
surrounded by apostrophes, instances may have both class name and
instance data:

    >>> print([42, '42'])
    [42, '42']
    >>> print([Decimal('42'), datetime.now()])
    [Decimal("42"), datetime.datetime(2008, 5, 27, 19, 57, 43, 485028)]

The disadvantage is that __repr__ often returns technical data (like
'<object at address>') or unreadable string (hex-encoded string if the
input is non-ASCII string):

    >>> print(['тест'])
    ['\xd4\xc5\xd3\xd4']

One of the motivations for PEP 3138 is that neither repr nor str will
allow the sensible printing of dicts whose keys are non-ASCII text
strings. Now that Unicode identifiers are allowed, it includes Python's
own attribute dicts. This also includes JSON serialization (and caused
some hoops for the json lib).

PEP 3138 proposes to fix this by breaking the "repr is safe ASCII"
invariant, and changing the way repr (which is used for persistence)
outputs some objects, with system-dependent failures.

Changing how str(container) works would allow easy debugging in the
normal case, and retain the safety of ASCII-only for the
machine-readable case. The only downside is that str(x) and repr(x)
would more often be different -- but only in those cases where the
current almost-the-same version is insufficient.

It also seems illogical that str(container) calls repr on items instead
of str. It's only logical to expect following code:

    class Test:
        def __str__(self):
            return "STR"

        def __repr__(self):
            return "REPR"


    test = Test()
    print(test)
    print(repr(test))
    print([test])
    print(str([test]))

to print:

    STR
    REPR
    [STR]
    [STR]

where it actually prints:

    STR
    REPR
    [REPR]
    [REPR]

Especially it is illogical to see that print in Python 2 uses str if it
is called on what seems to be a tuple:

    >>> print Decimal('42'), datetime.now()
    42 2008-05-27 20:16:22.534285

where on an actual tuple it prints:

    >>> print((Decimal('42'), datetime.now()))
    (Decimal("42"), datetime.datetime(2008, 5, 27, 20, 16, 27, 937911))

A different approach - call str(item)

For example, with numbers it is often only the value that people care
about.

    >>> print Decimal('3')
    3

But putting the value in a list forces users to read the type
information, exactly as if repr had been called for the benefit of a
machine:

    >>> print [Decimal('3')]
    [Decimal("3")]

After this change, the type information would not clutter the str
output:

    >>> print "%s".format([Decimal('3')])
    [3]
    >>> str([Decimal('3')])  # ==
    [3]

But it would still be available if desired:

    >>> print "%r".format([Decimal('3')])
    [Decimal('3')]
    >>> repr([Decimal('3')])  # ==
    [Decimal('3')]

There is a number of strategies to fix the problem. The most radical is
to change __repr__ so it accepts a new parameter (flag) "called from
str, so call str on items, not repr". The drawback of the proposal is
that every __repr__ implementation must be changed. Introspection could
help a bit (inspect __repr__ before calling if it accepts 2 or 3
parameters), but introspection doesn't work on classes written in C,
like all built-in containers.

Less radical proposal is to implement __str__ methods for built-in
container types. The obvious drawback is a duplication of effort - all
those __str__ and __repr__ implementations are only differ in one small
detail - if they call str or repr on items.

The most conservative proposal is not to change str at all but to allow
developers to implement their own application- or library-specific
pretty-printers. The drawback is again a multiplication of effort and
proliferation of many small specific container-traversal algorithms.

Backward compatibility

In those cases where type information is more important than usual, it
will still be possible to get the current results by calling repr
explicitly.

References

Copyright

This document has been placed in the public domain.

[1] Guido van Rossum, PEP: str(container) should call str(item), not
repr(item)
https://mail.python.org/pipermail/python-3000/2008-May/013876.html