PEP: 584 Title: Add Union Operators To dict Version: $Revision$
Last-Modified: $Date$ Author: Steven D'Aprano <steve@pearwood.info>,
Brandt Bucher <brandt@python.org> BDFL-Delegate: Guido van Rossum
<guido@python.org> Status: Final Type: Standards Track Content-Type:
text/x-rst Created: 01-Mar-2019 Python-Version: 3.9 Post-History:
01-Mar-2019, 16-Oct-2019, 02-Dec-2019, 04-Feb-2020, 17-Feb-2020
Resolution:
https://mail.python.org/archives/list/python-dev@python.org/thread/6KT2KIOTYXMDCD2CCAOLOI7LUGTN6MBS

Abstract

This PEP proposes adding merge (|) and update (|=) operators to the
built-in dict class.

Note

After this PEP was accepted, the decision was made to also implement the
new operators for several other standard library mappings.

Motivation

The current ways to merge two dicts have several disadvantages:

dict.update

d1.update(d2) modifies d1 in-place. e = d1.copy(); e.update(d2) is not
an expression and needs a temporary variable.

{**d1, **d2}

Dict unpacking looks ugly and is not easily discoverable. Few people
would be able to guess what it means the first time they see it, or
think of it as the "obvious way" to merge two dicts.

As Guido said:

  I'm sorry for PEP 448, but even if you know about **d in simpler
  contexts, if you were to ask a typical Python user how to combine two
  dicts into a new one, I doubt many people would think of {**d1, **d2}.
  I know I myself had forgotten about it when this thread started!

{**d1, **d2} ignores the types of the mappings and always returns a
dict. type(d1)({**d1, **d2}) fails for dict subclasses such as
defaultdict that have an incompatible __init__ method.

collections.ChainMap

ChainMap is unfortunately poorly-known and doesn't qualify as "obvious".
It also resolves duplicate keys in the opposite order to that expected
("first seen wins" instead of "last seen wins"). Like dict unpacking, it
is tricky to get it to honor the desired subclass. For the same reason,
type(d1)(ChainMap(d2, d1)) fails for some subclasses of dict.

Further, ChainMaps wrap their underlying dicts, so writes to the
ChainMap will modify the original dict:

    >>> d1 = {'spam': 1}
    >>> d2 = {'eggs': 2}
    >>> merged = ChainMap(d2, d1)
    >>> merged['eggs'] = 999
    >>> d2
    {'eggs': 999}

dict(d1, **d2)

This "neat trick" is not well-known, and only works when d2 is entirely
string-keyed:

    >>> d1 = {"spam": 1}
    >>> d2 = {3665: 2}
    >>> dict(d1, **d2)
    Traceback (most recent call last):
      ...
    TypeError: keywords must be strings

Rationale

The new operators will have the same relationship to the dict.update
method as the list concatenate (+) and extend (+=) operators have to
list.extend. Note that this is somewhat different from the relationship
that |/|= have with set.update; the authors have determined that
allowing the in-place operator to accept a wider range of types (as list
does) is a more useful design, and that restricting the types of the
binary operator's operands (again, as list does) will help avoid silent
errors caused by complicated implicit type casting on both sides.

Key conflicts will be resolved by keeping the rightmost value. This
matches the existing behavior of similar dict operations, where the last
seen value always wins:

    {'a': 1, 'a': 2}
    {**d, **e}
    d.update(e)
    d[k] = v
    {k: v for x in (d, e) for (k, v) in x.items()}

All of the above follow the same rule. This PEP takes the position that
this behavior is simple, obvious, usually the behavior we want, and
should be the default behavior for dicts. This means that dict union is
not commutative; in general d | e != e | d.

Similarly, the iteration order of the key-value pairs in the dictionary
will follow the same semantics as the examples above, with each newly
added key (and its value) being appended to the current sequence.

Specification

Dict union will return a new dict consisting of the left operand merged
with the right operand, each of which must be a dict (or an instance of
a dict subclass). If a key appears in both operands, the last-seen value
(i.e. that from the right-hand operand) wins:

    >>> d = {'spam': 1, 'eggs': 2, 'cheese': 3}
    >>> e = {'cheese': 'cheddar', 'aardvark': 'Ethel'}
    >>> d | e
    {'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
    >>> e | d
    {'cheese': 3, 'aardvark': 'Ethel', 'spam': 1, 'eggs': 2}

The augmented assignment version operates in-place:

    >>> d |= e
    >>> d
    {'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}

Augmented assignment behaves identically to the update method called
with a single positional argument, so it also accepts anything
implementing the Mapping protocol (more specifically, anything with the
keys and __getitem__ methods) or iterables of key-value pairs. This is
analogous to list += and list.extend, which accept any iterable, not
just lists. Continued from above:

    >>> d | [('spam', 999)]
    Traceback (most recent call last):
      ...
    TypeError: can only merge dict (not "list") to dict

    >>> d |= [('spam', 999)]
    >>> d
    {'spam': 999, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}

When new keys are added, their order matches their order within the
right-hand mapping, if any exists for its type.

Reference Implementation

One of the authors has written a C implementation.

An approximate pure-Python implementation is:

    def __or__(self, other):
        if not isinstance(other, dict):
            return NotImplemented
        new = dict(self)
        new.update(other)
        return new

    def __ror__(self, other):
        if not isinstance(other, dict):
            return NotImplemented
        new = dict(other)
        new.update(self)
        return new

    def __ior__(self, other):
        dict.update(self, other)
        return self

Major Objections

Dict Union Is Not Commutative

Union is commutative, but dict union will not be (d | e != e | d).

Response

There is precedent for non-commutative unions in Python:

    >>> {0} | {False}
    {0}
    >>> {False} | {0}
    {False}

While the results may be equal, they are distinctly different. In
general, a | b is not the same operation as b | a.

Dict Union Will Be Inefficient

Giving a pipe operator to mappings is an invitation to writing code that
doesn't scale well. Repeated dict union is inefficient:
d | e | f | g | h creates and destroys three temporary mappings.

Response

The same argument applies to sequence concatenation.

Sequence concatenation grows with the total number of items in the
sequences, leading to O(N**2) (quadratic) performance. Dict union is
likely to involve duplicate keys, so the temporary mappings will not
grow as fast.

Just as it is rare for people to concatenate large numbers of lists or
tuples, the authors of this PEP believe that it will be rare for people
to merge large numbers of dicts. collections.Counter is a dict subclass
that supports many operators, and there are no known examples of people
having performance issues due to combining large numbers of Counters.
Further, a survey of the standard library by the authors found no
examples of merging more than two dicts, so this is unlikely to be a
performance problem in practice... "Everything is fast for small enough
N".

If one expects to be merging a large number of dicts where performance
is an issue, it may be better to use an explicit loop and in-place
merging:

    new = {}
    for d in many_dicts:
        new |= d

Dict Union Is Lossy

Dict union can lose data (values may disappear); no other form of union
is lossy.

Response

It isn't clear why the first part of this argument is a problem.
dict.update() may throw away values, but not keys; that is expected
behavior, and will remain expected behavior regardless of whether it is
spelled as update() or |.

Other types of union are also lossy, in the sense of not being
reversible; you cannot get back the two operands given only the union.
a | b == 365... what are a and b?

Only One Way To Do It

Dict union will violate the Only One Way koan from the Zen.

Response

There is no such koan. "Only One Way" is a calumny about Python
originating long ago from the Perl community.

More Than One Way To Do It

Okay, the Zen doesn't say that there should be Only One Way To Do It.
But it does have a prohibition against allowing "more than one way to do
it".

Response

There is no such prohibition. The "Zen of Python" merely expresses a
preference for "only one obvious way":

    There should be one-- and preferably only one --obvious way to do
    it.

The emphasis here is that there should be an obvious way to do "it". In
the case of dict update operations, there are at least two different
operations that we might wish to do:

-   Update a dict in place: The Obvious Way is to use the update()
    method. If this proposal is accepted, the |= augmented assignment
    operator will also work, but that is a side-effect of how augmented
    assignments are defined. Which you choose is a matter of taste.
-   Merge two existing dicts into a third, new dict: This PEP proposes
    that the Obvious Way is to use the | merge operator.

In practice, this preference for "only one way" is frequently violated
in Python. For example, every for loop could be re-written as a while
loop; every if block could be written as an if/ else block. List, set
and dict comprehensions could all be replaced by generator expressions.
Lists offer no fewer than five ways to implement concatenation:

-   Concatenation operator: a + b
-   In-place concatenation operator: a += b
-   Slice assignment: a[len(a):] = b
-   Sequence unpacking: [*a, *b]
-   Extend method: a.extend(b)

We should not be too strict about rejecting useful functionality because
it violates "only one way".

Dict Union Makes Code Harder To Understand

Dict union makes it harder to tell what code means. To paraphrase the
objection rather than quote anyone in specific: "If I see spam | eggs, I
can't tell what it does unless I know what spam and eggs are".

Response

This is very true. But it is equally true today, where the use of the |
operator could mean any of:

-   int/bool bitwise-or
-   set/frozenset union
-   any other overloaded operation

Adding dict union to the set of possibilities doesn't seem to make it
harder to understand the code. No more work is required to determine
that spam and eggs are mappings than it would take to determine that
they are sets, or integers. And good naming conventions will help:

    flags |= WRITEABLE  # Probably numeric bitwise-or.
    DO_NOT_RUN = WEEKENDS | HOLIDAYS  # Probably set union.
    settings = DEFAULT_SETTINGS | user_settings | workspace_settings  # Probably dict union.

What About The Full set API?

dicts are "set like", and should support the full collection of set
operators: |, &, ^, and -.

Response

This PEP does not take a position on whether dicts should support the
full collection of set operators, and would prefer to leave that for a
later PEP (one of the authors is interested in drafting such a PEP). For
the benefit of any later PEP, a brief summary follows.

Set symmetric difference (^) is obvious and natural. For example, given
two dicts:

    d1 = {"spam": 1, "eggs": 2}
    d2 = {"ham": 3, "eggs": 4}

the symmetric difference d1 ^ d2 would be {"spam": 1, "ham": 3}.

Set difference (-) is also obvious and natural, and an earlier version
of this PEP included it in the proposal. Given the dicts above, we would
have d1 - d2 be {"spam": 1} and d2 - d1 be {"ham": 3}.

Set intersection (&) is a bit more problematic. While it is easy to
determine the intersection of keys in two dicts, it is not clear what to
do with the values. Given the two dicts above, it is obvious that the
only key of d1 & d2 must be "eggs". "Last seen wins", however, has the
advantage of consistency with other dict operations (and the proposed
union operators).

What About Mapping And MutableMapping?

collections.abc.Mapping and collections.abc.MutableMapping should define
| and |=, so subclasses could just inherit the new operators instead of
having to define them.

Response

There are two primary reasons why adding the new operators to these
classes would be problematic:

-   Currently, neither defines a copy method, which would be necessary
    for | to create a new instance.
-   Adding |= to MutableMapping (or a copy method to Mapping) would
    create compatibility issues for virtual subclasses.

Rejected Ideas

Rejected Semantics

There were at least four other proposed solutions for handling
conflicting keys. These alternatives are left to subclasses of dict.

Raise

It isn't clear that this behavior has many use-cases or will be often
useful, but it will likely be annoying as any use of the dict union
operator would have to be guarded with a try/except clause.

Add The Values (As Counter Does, with +)

Too specialised to be used as the default behavior.

Leftmost Value (First-Seen) Wins

It isn't clear that this behavior has many use-cases. In fact, one can
simply reverse the order of the arguments:

    d2 | d1  # d1 merged with d2, keeping existing values in d1

Concatenate Values In A List

This is likely to be too specialised to be the default. It is not clear
what to do if the values are already lists:

    {'a': [1, 2]} | {'a': [3, 4]}

Should this give {'a': [1, 2, 3, 4]} or {'a': [[1, 2], [3, 4]]}?

Rejected Alternatives

Use The Addition Operator

This PEP originally started life as a proposal for dict addition, using
the + and += operator. That choice proved to be exceedingly
controversial, with many people having serious objections to the choice
of operator. For details, see previous versions of the PEP and the
mailing list discussions.

Use The Left Shift Operator

The << operator didn't seem to get much support on Python-Ideas, but no
major objections either. Perhaps the strongest objection was Chris
Angelico's comment

  The "cuteness" value of abusing the operator to indicate information
  flow got old shortly after C++ did it.

Use A New Left Arrow Operator

Another suggestion was to create a new operator <-. Unfortunately this
would be ambiguous, d <- e could mean d merge e or d less-than minus e.

Use A Method

A dict.merged() method would avoid the need for an operator at all. One
subtlety is that it would likely need slightly different implementations
when called as an unbound method versus as a bound method.

As an unbound method, the behavior could be similar to:

    def merged(cls, *mappings, **kw):
        new = cls()  # Will this work for defaultdict?
        for m in mappings:
            new.update(m)
        new.update(kw)
        return new

As a bound method, the behavior could be similar to:

    def merged(self, *mappings, **kw):
        new = self.copy()
        for m in mappings:
            new.update(m)
        new.update(kw)
        return new

Advantages

-   Arguably, methods are more discoverable than operators.
-   The method could accept any number of positional and keyword
    arguments, avoiding the inefficiency of creating temporary dicts.
-   Accepts sequences of (key, value) pairs like the update method.
-   Being a method, it is easy to override in a subclass if you need
    alternative behaviors such as "first wins", "unique keys", etc.

Disadvantages

-   Would likely require a new kind of method decorator which combined
    the behavior of regular instance methods and classmethod. It would
    need to be public (but not necessarily a builtin) for those needing
    to override the method. There is a proof of concept.
-   It isn't an operator. Guido discusses why operators are useful. For
    another viewpoint, see Alyssa Coghlan's blog post.

Use a Function

Instead of a method, use a new built-in function merged(). One possible
implementation could be something like this:

    def merged(*mappings, **kw):
        if mappings and isinstance(mappings[0], dict):
            # If the first argument is a dict, use its type.
            new = mappings[0].copy()
            mappings = mappings[1:]
        else:
            # No positional arguments, or the first argument is a
            # sequence of (key, value) pairs.
            new = dict()
        for m in mappings:
            new.update(m)
        new.update(kw)
        return new

An alternative might be to forgo the arbitrary keywords, and take a
single keyword parameter that specifies the behavior on collisions:

    def merged(*mappings, on_collision=lambda k, v1, v2: v2):
        # implementation left as an exercise to the reader

Advantages

-   Most of the same advantages of the method solutions above.
-   Doesn't require a subclass to implement alternative behavior on
    collisions, just a function.

Disadvantages

-   May not be important enough to be a builtin.
-   Hard to override behavior if you need something like "first wins",
    without losing the ability to process arbitrary keyword arguments.

Examples

The authors of this PEP did a survey of third party libraries for
dictionary merging which might be candidates for dict union.

This is a cursory list based on a subset of whatever arbitrary
third-party packages happened to be installed on one of the authors'
computers, and may not reflect the current state of any package. Also
note that, while further (unrelated) refactoring may be possible, the
rewritten version only adds usage of the new operators for an
apples-to-apples comparison. It also reduces the result to an expression
when it is efficient to do so.

IPython/zmq/ipkernel.py

Before:

    aliases = dict(kernel_aliases)
    aliases.update(shell_aliases)

After:

    aliases = kernel_aliases | shell_aliases

IPython/zmq/kernelapp.py

Before:

    kernel_aliases = dict(base_aliases)
    kernel_aliases.update({
        'ip' : 'KernelApp.ip',
        'hb' : 'KernelApp.hb_port',
        'shell' : 'KernelApp.shell_port',
        'iopub' : 'KernelApp.iopub_port',
        'stdin' : 'KernelApp.stdin_port',
        'parent': 'KernelApp.parent',
    })
    if sys.platform.startswith('win'):
        kernel_aliases['interrupt'] = 'KernelApp.interrupt'

    kernel_flags = dict(base_flags)
    kernel_flags.update({
        'no-stdout' : (
                {'KernelApp' : {'no_stdout' : True}},
                "redirect stdout to the null device"),
        'no-stderr' : (
                {'KernelApp' : {'no_stderr' : True}},
                "redirect stderr to the null device"),
    })

After:

    kernel_aliases = base_aliases | {
        'ip' : 'KernelApp.ip',
        'hb' : 'KernelApp.hb_port',
        'shell' : 'KernelApp.shell_port',
        'iopub' : 'KernelApp.iopub_port',
        'stdin' : 'KernelApp.stdin_port',
        'parent': 'KernelApp.parent',
    }
    if sys.platform.startswith('win'):
        kernel_aliases['interrupt'] = 'KernelApp.interrupt'

    kernel_flags = base_flags | {
        'no-stdout' : (
                {'KernelApp' : {'no_stdout' : True}},
                "redirect stdout to the null device"),
        'no-stderr' : (
                {'KernelApp' : {'no_stderr' : True}},
                "redirect stderr to the null device"),
    }

matplotlib/backends/backend_svg.py

Before:

    attrib = attrib.copy()
    attrib.update(extra)
    attrib = attrib.items()

After:

    attrib = (attrib | extra).items()

matplotlib/delaunay/triangulate.py

Before:

    edges = {}
    edges.update(dict(zip(self.triangle_nodes[border[:,0]][:,1],
                 self.triangle_nodes[border[:,0]][:,2])))
    edges.update(dict(zip(self.triangle_nodes[border[:,1]][:,2],
                 self.triangle_nodes[border[:,1]][:,0])))
    edges.update(dict(zip(self.triangle_nodes[border[:,2]][:,0],
                 self.triangle_nodes[border[:,2]][:,1])))

Rewrite as:

    edges = {}
    edges |= zip(self.triangle_nodes[border[:,0]][:,1],
                 self.triangle_nodes[border[:,0]][:,2])
    edges |= zip(self.triangle_nodes[border[:,1]][:,2],
                 self.triangle_nodes[border[:,1]][:,0])
    edges |= zip(self.triangle_nodes[border[:,2]][:,0],
                 self.triangle_nodes[border[:,2]][:,1])

matplotlib/legend.py

Before:

    hm = default_handler_map.copy()
    hm.update(self._handler_map)
    return hm

After:

    return default_handler_map | self._handler_map

numpy/ma/core.py

Before:

    _optinfo = {}
    _optinfo.update(getattr(obj, '_optinfo', {}))
    _optinfo.update(getattr(obj, '_basedict', {}))
    if not isinstance(obj, MaskedArray):
        _optinfo.update(getattr(obj, '__dict__', {}))

After:

    _optinfo = {}
    _optinfo |= getattr(obj, '_optinfo', {})
    _optinfo |= getattr(obj, '_basedict', {})
    if not isinstance(obj, MaskedArray):
        _optinfo |= getattr(obj, '__dict__', {})

praw/internal.py

Before:

    data = {'name': six.text_type(user), 'type': relationship}
    data.update(kwargs)

After:

    data = {'name': six.text_type(user), 'type': relationship} | kwargs

pygments/lexer.py

Before:

    kwargs.update(lexer.options)
    lx = lexer.__class__(**kwargs)

After:

    lx = lexer.__class__(**(kwargs | lexer.options))

requests/sessions.py

Before:

    merged_setting = dict_class(to_key_val_list(session_setting))
    merged_setting.update(to_key_val_list(request_setting))

After:

    merged_setting = dict_class(to_key_val_list(session_setting)) | to_key_val_list(request_setting)

sphinx/domains/__init__.py

Before:

    self.attrs = self.known_attrs.copy()
    self.attrs.update(attrs)

After:

    self.attrs = self.known_attrs | attrs

sphinx/ext/doctest.py

Before:

    new_opt = code[0].options.copy()
    new_opt.update(example.options)
    example.options = new_opt

After:

    example.options = code[0].options | example.options

sphinx/ext/inheritance_diagram.py

Before:

    n_attrs = self.default_node_attrs.copy()
    e_attrs = self.default_edge_attrs.copy()
    g_attrs.update(graph_attrs)
    n_attrs.update(node_attrs)
    e_attrs.update(edge_attrs)

After:

    g_attrs |= graph_attrs
    n_attrs = self.default_node_attrs | node_attrs
    e_attrs = self.default_edge_attrs | edge_attrs

sphinx/highlighting.py

Before:

    kwargs.update(self.formatter_args)
    return self.formatter(**kwargs)

After:

    return self.formatter(**(kwargs | self.formatter_args))

sphinx/quickstart.py

Before:

    d2 = DEFAULT_VALUE.copy()
    d2.update(dict(("ext_"+ext, False) for ext in EXTENSIONS))
    d2.update(d)
    d = d2

After:

    d = DEFAULT_VALUE | dict(("ext_"+ext, False) for ext in EXTENSIONS) | d

sympy/abc.py

Before:

    clash = {}
    clash.update(clash1)
    clash.update(clash2)
    return clash1, clash2, clash

After:

    return clash1, clash2, clash1 | clash2

sympy/parsing/maxima.py

Before:

    dct = MaximaHelpers.__dict__.copy()
    dct.update(name_dict)
    obj = sympify(str, locals=dct)

After:

    obj = sympify(str, locals=MaximaHelpers.__dict__|name_dict)

sympy/printing/ccode.py and sympy/printing/fcode.py

Before:

    self.known_functions = dict(known_functions)
    userfuncs = settings.get('user_functions', {})
    self.known_functions.update(userfuncs)

After:

    self.known_functions = known_functions | settings.get('user_functions', {})

sympy/utilities/runtests.py

Before:

    globs = globs.copy()
    if extraglobs is not None:
        globs.update(extraglobs)

After:

    globs = globs | (extraglobs if extraglobs is not None else {})

The above examples show that sometimes the | operator leads to a clear
increase in readability, reducing the number of lines of code and
improving clarity. However other examples using the | operator lead to
long, complex single expressions, possibly well over the PEP 8 maximum
line length of 80 columns. As with any other language feature, the
programmer should use their own judgement about whether | improves their
code.

Related Discussions

Mailing list threads (this is by no means an exhaustive list):

-   Dict joining using + and +=
-   PEP: Dict addition and subtraction
-   PEP 584: Add + and += operators to the built-in dict class.
-   Moving PEP 584 forward (dict + and += operators)
-   PEP 584: Add Union Operators To dict
-   Accepting PEP 584: Add Union Operators To dict

Ticket on the bug tracker

Merging two dictionaries in an expression is a frequently requested
feature. For example:

https://stackoverflow.com/questions/38987/how-to-merge-two-dictionaries-in-a-single-expression

https://stackoverflow.com/questions/1781571/how-to-concatenate-two-dictionaries-to-create-a-new-one-in-python

https://stackoverflow.com/questions/6005066/adding-dictionaries-together-python

Occasionally people request alternative behavior for the merge:

https://stackoverflow.com/questions/1031199/adding-dictionaries-in-python

https://stackoverflow.com/questions/877295/python-dict-add-by-valuedict-2

...including one proposal to treat dicts as sets of keys.

Ian Lee's proto-PEP, and discussion in 2015. Further discussion took
place on Python-Ideas.

(Observant readers will notice that one of the authors of this PEP was
more skeptical of the idea in 2015.)

Adding a full complement of operators to dicts.

Discussion on Y-Combinator.

https://treyhunner.com/2016/02/how-to-merge-dictionaries-in-python/

https://code.tutsplus.com/tutorials/how-to-merge-two-python-dictionaries--cms-26230

In direct response to an earlier draft of this PEP, Serhiy Storchaka
raised a ticket in the bug tracker to replace the copy(); update() idiom
with dict unpacking.

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.