PEP: 218 Title: Adding a Built-In Set Object Type Author: Greg Wilson
<gvwilson@ddj.com>, Raymond Hettinger <python@rcn.com> Status: Final
Type: Standards Track Content-Type: text/x-rst Created: 31-Jul-2000
Python-Version: 2.2 Post-History:

Introduction

This PEP proposes adding a Set module to the standard Python library,
and to then make sets a built-in Python type if that module is widely
used. After explaining why sets are desirable, and why the common idiom
of using dictionaries in their place is inadequate, we describe how we
intend built-in sets to work, and then how the preliminary Set module
will behave. The last section discusses the mutability (or otherwise) of
sets and set elements, and the solution which the Set module will
implement.

Rationale

Sets are a fundamental mathematical structure, and are very commonly
used in algorithm specifications. They are much less frequently used in
implementations, even when they are the "right" structure. Programmers
frequently use lists instead, even when the ordering information in
lists is irrelevant, and by-value lookups are frequent. (Most
medium-sized C programs contain a depressing number of start-to-end
searches through malloc'd vectors to determine whether particular items
are present or not...)

Programmers are often told that they can implement sets as dictionaries
with "don't care" values. Items can be added to these "sets" by
assigning the "don't care" value to them; membership can be tested using
dict.has_key; and items can be deleted using del. However, the other
main operations on sets (union, intersection, and difference) are not
directly supported by this representation, since their meaning is
ambiguous for dictionaries containing key/value pairs.

Proposal

The long-term goal of this PEP is to add a built-in set type to Python.
This type will be an unordered collection of unique values, just as a
dictionary is an unordered collection of key/value pairs.

Iteration and comprehension will be implemented in the obvious ways, so
that:

    for x in S:

will step through the elements of S in arbitrary order, while:

    set(x**2 for x in S)

will produce a set containing the squares of all elements in S,
Membership will be tested using in and not in, and basic set operations
will be implemented by a mixture of overloaded operators:

  ----------- -------------------------------
  |           union
  &           intersection
  ^           symmetric difference
  -           asymmetric difference
  == !=       equality and inequality tests
  < <= >= >   subset and superset tests
  ----------- -------------------------------

and methods:

  ---------------- ----------------------------------------------------------------------------------------------
  S.add(x)         Add "x" to the set.
  S.update(s)      Add all elements of sequence "s" to the set.
  S.remove(x)      Remove "x" from the set. If "x" is not present, this method raises a LookupError exception.
  S.discard(x)     Remove "x" from the set if it is present, or do nothing if it is not.
  S.pop()          Remove and return an arbitrary element, raising a LookupError if the element is not present.
  S.clear()        Remove all elements from this set.
  S.copy()         Make a new set.
  s.issuperset()   Check for a superset relationship.
  s.issubset()     Check for a subset relationship.
  ---------------- ----------------------------------------------------------------------------------------------

and two new built-in conversion functions:

  -------------- ------------------------------------------------------------------------
  set(x)         Create a set containing the elements of the collection "x".
  frozenset(x)   Create an immutable set containing the elements of the collection "x".
  -------------- ------------------------------------------------------------------------

Notes:

1.  We propose using the bitwise operators "|&" for intersection and
    union. While "+" for union would be intuitive, "*" for intersection
    is not (very few of the people asked guessed what it did correctly).
2.  We considered using "+" to add elements to a set, rather than "add".
    However, Guido van Rossum pointed out that "+" is symmetric for
    other built-in types (although "*" is not). Use of "add" will also
    avoid confusion between that operation and set union.

Set Notation

The PEP originally proposed {1,2,3} as the set notation and {-} for the
empty set. Experience with Python 2.3's sets.py showed that the notation
was not necessary. Also, there was some risk of making dictionaries less
instantly recognizable.

It was also contemplated that the braced notation would support set
comprehensions; however, Python 2.4 provided generator expressions which
fully met that need and did so it a more general way. (See PEP 289 for
details on generator expressions).

So, Guido ruled that there would not be a set syntax; however, the issue
could be revisited for Python 3000 (see PEP 3000).

History

To gain experience with sets, a pure python module was introduced in
Python 2.3. Based on that implementation, the set and frozenset types
were introduced in Python 2.4. The improvements are:

-   Better hash algorithm for frozensets
-   More compact pickle format (storing only an element list instead of
    a dictionary of key:value pairs where the value is always True).
-   Use a __reduce__ function so that deep copying is automatic.
-   The BaseSet concept was eliminated.
-   The union_update() method became just update().
-   Auto-conversion between mutable and immutable sets was dropped.
-   The _repr method was dropped (the need is met by the new sorted()
    built-in function).

Tim Peters believes that the class's constructor should take a single
sequence as an argument, and populate the set with that sequence's
elements. His argument is that in most cases, programmers will be
creating sets from pre-existing sequences, so that this case should be
the common one. However, this would require users to remember an extra
set of parentheses when initializing a set with known values:

    >>> Set((1, 2, 3, 4))       # case 1

On the other hand, feedback from a small number of novice Python users
(all of whom were very experienced with other languages) indicates that
people will find a "parenthesis-free" syntax more natural:

    >>> Set(1, 2, 3, 4)         # case 2

Ultimately, we adopted the first strategy in which the initializer takes
a single iterable argument.

Mutability

The most difficult question to resolve in this proposal was whether sets
ought to be able to contain mutable elements. A dictionary's keys must
be immutable in order to support fast, reliable lookup. While it would
be easy to require set elements to be immutable, this would preclude
sets of sets (which are widely used in graph algorithms and other
applications).

Earlier drafts of PEP 218 had only a single set type, but the sets.py
implementation in Python 2.3 has two, Set and ImmutableSet. For Python
2.4, the new built-in types were named set and frozenset which are
slightly less cumbersome.

There are two classes implemented in the "sets" module. Instances of the
Set class can be modified by the addition or removal of elements, and
the ImmutableSet class is "frozen", with an unchangeable collection of
elements. Therefore, an ImmutableSet may be used as a dictionary key or
as a set element, but cannot be updated. Both types of set require that
their elements are immutable, hashable objects. Parallel comments apply
to the "set" and "frozenset" built-in types.

Copyright

This document has been placed in the Public Domain.