PEP: 455 Title: Adding a key-transforming dictionary to collections
Version: $Revision$ Last-Modified: $Date$ Author: Antoine Pitrou
<solipsis@pitrou.net> BDFL-Delegate: Raymond Hettinger Status: Rejected
Type: Standards Track Content-Type: text/x-rst Created: 13-Sep-2013
Python-Version: 3.5 Post-History:

Abstract

This PEP proposes a new data structure for the collections module,
called "TransformDict" in this PEP. This structure is a mutable mapping
which transforms the key using a given function when doing a lookup, but
retains the original key when reading.

Rejection

See the rationale at
https://mail.python.org/pipermail/python-dev/2015-May/140003.html and
for an earlier partial review, see
https://mail.python.org/pipermail/python-dev/2013-October/129937.html .

Rationale

Numerous specialized versions of this pattern exist. The most common is
a case-insensitive case-preserving dict, i.e. a dict-like container
which matches keys in a case-insensitive fashion but retains the
original casing. It is a very common need in network programming, as
many protocols feature some arrays of "key / value" properties in their
messages, where the keys are textual strings whose case is specified to
be ignored on receipt but by either specification or custom is to be
preserved or non-trivially canonicalized when retransmitted.

Another common request is an identity dict, where keys are matched
according to their respective id()s instead of normal matching.

Both are instances of a more general pattern, where a given
transformation function is applied to keys when looking them up: that
function being str.lower or str.casefold in the former example and the
built-in id function in the latter.

(It could be said that the pattern projects keys from the user-visible
set onto the internal lookup set.)

Semantics

TransformDict is a MutableMapping implementation: it faithfully
implements the well-known API of mutable mappings, like dict itself and
other dict-like classes in the standard library. Therefore, this PEP
won't rehash the semantics of most TransformDict methods.

The transformation function needn't be bijective, it can be strictly
surjective as in the case-insensitive example (in other words, different
keys can lookup the same value):

    >>> d = TransformDict(str.casefold)
    >>> d['SomeKey'] = 5
    >>> d['somekey']
    5
    >>> d['SOMEKEY']
    5

TransformDict retains the first key used when creating an entry:

    >>> d = TransformDict(str.casefold)
    >>> d['SomeKey'] = 1
    >>> d['somekey'] = 2
    >>> list(d.items())
    [('SomeKey', 2)]

The original keys needn't be hashable, as long as the transformation
function returns a hashable one:

    >>> d = TransformDict(id)
    >>> l = [None]
    >>> d[l] = 5
    >>> l in d
    True

Constructor

As shown in the examples above, creating a TransformDict requires
passing the key transformation function as the first argument (much like
creating a defaultdict requires passing the factory function as first
argument).

The constructor also takes other optional arguments which can be used to
initialize the TransformDict with certain key-value pairs. Those
optional arguments are the same as in the dict and defaultdict
constructors:

    >>> d = TransformDict(str.casefold, [('Foo', 1)], Bar=2)
    >>> sorted(d.items())
    [('Bar', 2), ('Foo', 1)]

Getting the original key

TransformDict also features a lookup method returning the stored key
together with the corresponding value:

    >>> d = TransformDict(str.casefold, {'Foo': 1})
    >>> d.getitem('FOO')
    ('Foo', 1)
    >>> d.getitem('bar')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    KeyError: 'bar'

The method name getitem() follows the standard popitem() method on
mutable mappings.

Getting the transformation function

TransformDict has a simple read-only property transform_func which gives
back the transformation function.

Alternative proposals and questions

Retaining the last original key

Most python-dev respondents found retaining the first user-supplied key
more intuitive than retaining the last. Also, it matches the dict
object's own behaviour when using different but equal keys:

    >>> d = {}
    >>> d[1] = 'hello'
    >>> d[1.0] = 'world'
    >>> d
    {1: 'world'}

Furthermore, explicitly retaining the last key in a first-key-retaining
scheme is still possible using the following approach:

    d.pop(key, None)
    d[key] = value

while the converse (retaining the first key in a last-key-retaining
scheme) doesn't look possible without rewriting part of the container's
code.

Using an encoder / decoder pair

Using a function pair isn't necessary, since the original key is
retained by the container. Moreover, an encoder / decoder pair would
require the transformation to be bijective, which prevents important use
cases like case-insensitive matching.

Providing a transformation function for values

Dictionary values are not used for lookup, their semantics are totally
irrelevant to the container's operation. Therefore, there is no point in
having both an "original" and a "transformed" value: the transformed
value wouldn't be used for anything.

Providing a specialized container, not generic

It was asked why we would provide the generic TransformDict construct
rather than a specialized case-insensitive dict variant. The answer is
that it's nearly as cheap (code-wise and performance-wise) to provide
the generic construct, and it can fill more use cases.

Even case-insensitive dicts can actually elicit different transformation
functions: str.lower, str.casefold or in some cases bytes.lower when
working with text encoded in an ASCII-compatible encoding.

Other constructor patterns

Two other constructor patterns were proposed by Serhiy Storchaka:

-   A type factory scheme:

        d = TransformDict(str.casefold)(Foo=1)

-   A subclassing scheme:

        class CaseInsensitiveDict(TransformDict):
            __transform__ = str.casefold

        d = CaseInsensitiveDict(Foo=1)

While both approaches can be defended, they don't follow established
practices in the standard library, and therefore were rejected.

Implementation

A patch for the collections module is tracked on the bug tracker at
http://bugs.python.org/issue18986.

Existing work

Case-insensitive dicts are a popular request:

-   http://twistedmatrix.com/documents/current/api/twisted.python.util.InsensitiveDict.html
-   https://mail.python.org/pipermail/python-list/2013-May/647243.html
-   https://mail.python.org/pipermail/python-list/2005-April/296208.html
-   https://mail.python.org/pipermail/python-list/2004-June/241748.html
-   http://bugs.python.org/msg197376
-   http://stackoverflow.com/a/2082169
-   http://stackoverflow.com/a/3296782
-   http://code.activestate.com/recipes/66315-case-insensitive-dictionary/
-   https://gist.github.com/babakness/3901174
-   http://www.wikier.org/blog/key-insensitive-dictionary-in-python
-   http://en.sharejs.com/python/14534
-   http://www.voidspace.org.uk/python/archive.shtml#caseless

Identity dicts have been requested too:

-   https://mail.python.org/pipermail/python-ideas/2010-May/007235.html
-   http://www.gossamer-threads.com/lists/python/python/209527

Several modules in the standard library use identity lookups for object
memoization, for example pickle, json, copy, cProfile, doctest and
_threading_local.

Other languages

C# / .Net

.Net has a generic Dictionary class where you can specify a custom
IEqualityComparer: http://msdn.microsoft.com/en-us/library/xfhwa508.aspx

Using it is the recommended way to write case-insensitive dictionaries:
http://stackoverflow.com/questions/13230414/case-insensitive-access-for-generic-dictionary

Java

Java has a specialized CaseInsensitiveMap:
http://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/map/CaseInsensitiveMap.html

It also has a separate IdentityHashMap:
http://docs.oracle.com/javase/6/docs/api/java/util/IdentityHashMap.html

C++

The C++ Standard Template Library features an unordered_map with
customizable hash and equality functions:
http://www.cplusplus.com/reference/unordered_map/unordered_map/

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End: