PEP: 3128 Title: BList: A Faster List-like Type Version: $Revision$
Last-Modified: $Date$ Author: Daniel Stutzbach
<daniel@stutzbachenterprises.com> Discussions-To: python-3000@python.org
Status: Rejected Type: Standards Track Content-Type: text/x-rst Created:
30-Apr-2007 Python-Version: 2.6, 3.0 Post-History: 30-Apr-2007

Rejection Notice

Rejected based on Raymond Hettinger's sage advice[1]:

  After looking at the source, I think this has almost zero chance for
  replacing list(). There is too much value in a simple C API, low space
  overhead for small lists, good performance is common use cases, and
  having performance that is easily understood. The BList implementation
  lacks these virtues and it trades-off a little performance in common
  cases in for much better performance in uncommon cases. As a Py3.0
  PEP, I think it can be rejected.

  Depending on its success as a third-party module, it still has a
  chance for inclusion in the collections module. The essential criteria
  for that is whether it is a superior choice for some real-world use
  cases. I've scanned my own code and found no instances where BList
  would have been preferable to a regular list. However, that scan has a
  selection bias because it doesn't reflect what I would have written
  had BList been available. So, after a few months, I intend to poll
  comp.lang.python for BList success stories. If they exist, then I have
  no problem with inclusion in the collections module. After all, its
  learning curve is near zero -- the only cost is the clutter factor
  stemming from indecision about the most appropriate data structure for
  a given task.

Abstract

The common case for list operations is on small lists. The current
array-based list implementation excels at small lists due to the strong
locality of reference and infrequency of memory allocation operations.
However, an array takes O(n) time to insert and delete elements, which
can become problematic as the list gets large.

This PEP introduces a new data type, the BList, that has array-like and
tree-like aspects. It enjoys the same good performance on small lists as
the existing array-based implementation, but offers superior asymptotic
performance for most operations. This PEP proposes replacing the makes
two mutually exclusive proposals for including the BList type in Python:

1.  Add it to the collections module, or
2.  Replace the existing list type

Motivation

The BList grew out of the frustration of needing to rewrite intuitive
algorithms that worked fine for small inputs but took O(n**2) time for
large inputs due to the underlying O(n) behavior of array-based lists.
The deque type, introduced in Python 2.4, solved the most common problem
of needing a fast FIFO queue. However, the deque type doesn't help if we
need to repeatedly insert or delete elements from the middle of a long
list.

A wide variety of data structure provide good asymptotic performance for
insertions and deletions, but they either have O(n) performance for
other operations (e.g., linked lists) or have inferior performance for
small lists (e.g., binary trees and skip lists).

The BList type proposed in this PEP is based on the principles of
B+Trees, which have array-like and tree-like aspects. The BList offers
array-like performance on small lists, while offering O(log n)
asymptotic performance for all insert and delete operations.
Additionally, the BList implements copy-on-write under-the-hood, so even
operations like getslice take O(log n) time. The table below compares
the asymptotic performance of the current array-based list
implementation with the asymptotic performance of the BList.

  Operation   Array-based list   BList
  ----------- ------------------ ------------------
  Copy        O(n)               O(1)
  Append      O(1)               O(log n)
  Insert      O(n)               O(log n)
  Get Item    O(1)               O(log n)
  Set Item    O(1)               O(log n)
  Del Item    O(n)               O(log n)
  Iteration   O(n)               O(n)
  Get Slice   O(k)               O(log n)
  Del Slice   O(n)               O(log n)
  Set Slice   O(n+k)             O(log k + log n)
  Extend      O(k)               O(log k + log n)
  Sort        O(n log n)         O(n log n)
  Multiply    O(nk)              O(log k)

An extensive empirical comparison of Python's array-based list and the
BList are available at[2].

Use Case Trade-offs

The BList offers superior performance for many, but not all, operations.
Choosing the correct data type for a particular use case depends on
which operations are used. Choosing the correct data type as a built-in
depends on balancing the importance of different use cases and the
magnitude of the performance differences.

For the common uses cases of small lists, the array-based list and the
BList have similar performance characteristics.

For the slightly less common case of large lists, there are two common
uses cases where the existing array-based list outperforms the existing
BList reference implementation. These are:

1.  A large LIFO stack, where there are many .append() and .pop(-1)
    operations. Each operation is O(1) for an array-based list, but
    O(log n) for the BList.
2.  A large list that does not change size. The getitem and setitem
    calls are O(1) for an array-based list, but O(log n) for the BList.

In performance tests on a 10,000 element list, BLists exhibited a 50%
and 5% increase in execution time for these two uses cases,
respectively.

The performance for the LIFO use case could be improved to O(n) time, by
caching a pointer to the right-most leaf within the root node. For lists
that do not change size, the common case of sequential access could also
be improved to O(n) time via caching in the root node. However, the
performance of these approaches has not been empirically tested.

Many operations exhibit a tremendous speed-up (O(n) to O(log n)) when
switching from the array-based list to BLists. In performance tests on a
10,000 element list, operations such as getslice, setslice, and
FIFO-style insert and deletes on a BList take only 1% of the time needed
on array-based lists.

In light of the large performance speed-ups for many operations, the
small performance costs for some operations will be worthwhile for many
(but not all) applications.

Implementation

The BList is based on the B+Tree data structure. The BList is a wide,
bushy tree where each node contains an array of up to 128 pointers to
its children. If the node is a leaf, its children are the user-visible
objects that the user has placed in the list. If node is not a leaf, its
children are other BList nodes that are not user-visible. If the list
contains only a few elements, they will all be a children of single node
that is both the root and a leaf. Since a node is little more than array
of pointers, small lists operate in effectively the same way as an
array-based data type and share the same good performance
characteristics.

The BList maintains a few invariants to ensure good (O(log n))
asymptotic performance regardless of the sequence of insert and delete
operations. The principle invariants are as follows:

1.  Each node has at most 128 children.
2.  Each non-root node has at least 64 children.
3.  The root node has at least 2 children, unless the list contains
    fewer than 2 elements.
4.  The tree is of uniform depth.

If an insert would cause a node to exceed 128 children, the node spawns
a sibling and transfers half of its children to the sibling. The sibling
is inserted into the node's parent. If the node is the root node (and
thus has no parent), a new parent is created and the depth of the tree
increases by one.

If a deletion would cause a node to have fewer than 64 children, the
node moves elements from one of its siblings if possible. If both of its
siblings also only have 64 children, then two of the nodes merge and the
empty one is removed from its parent. If the root node is reduced to
only one child, its single child becomes the new root (i.e., the depth
of the tree is reduced by one).

In addition to tree-like asymptotic performance and array-like
performance on small-lists, BLists support transparent copy-on-write. If
a non-root node needs to be copied (as part of a getslice, copy,
setslice, etc.), the node is shared between multiple parents instead of
being copied. If it needs to be modified later, it will be copied at
that time. This is completely behind-the-scenes; from the user's point
of view, the BList works just like a regular Python list.

Memory Usage

In the worst case, the leaf nodes of a BList have only 64 children each,
rather than a full 128, meaning that memory usage is around twice that
of a best-case array implementation. Non-leaf nodes use up a negligible
amount of additional memory, since there are at least 63 times as many
leaf nodes as non-leaf nodes.

The existing array-based list implementation must grow and shrink as
items are added and removed. To be efficient, it grows and shrinks only
when the list has grow or shrunk exponentially. In the worst case, it,
too, uses twice as much memory as the best case.

In summary, the BList's memory footprint is not significantly different
from the existing array-based implementation.

Backwards Compatibility

If the BList is added to the collections module, backwards compatibility
is not an issue. This section focuses on the option of replacing the
existing array-based list with the BList. For users of the Python
interpreter, a BList has an identical interface to the current
list-implementation. For virtually all operations, the behavior is
identical, aside from execution speed.

For the C API, BList has a different interface than the existing
list-implementation. Due to its more complex structure, the BList does
not lend itself well to poking and prodding by external sources.
Thankfully, the existing list-implementation defines an API of functions
and macros for accessing data from list objects. Google Code Search
suggests that the majority of third-party modules uses the well-defined
API rather than relying on the list's structure directly. The table
below summarizes the search queries and results:

+--------------------------+-------------------+
| Search String            | Number of Results |
+==========================+===================+
| PyList_GetItem           | 2,000             |
+--------------------------+-------------------+
| PySequence_GetItem       |   800             |
+--------------------------+-------------------+
| PySequence_Fast_GET_ITEM |   100             |
+--------------------------+-------------------+
| PyList_GET_ITEM          |   400             |
+--------------------------+-------------------+
| [^a-zA-Z_]ob_item        |   100             |
+--------------------------+-------------------+

This can be achieved in one of two ways:

1.  Redefine the various accessor functions and macros in listobject.h
    to access a BList instead. The interface would be unchanged. The
    functions can easily be redefined. The macros need a bit more care
    and would have to resort to function calls for large lists.

    The macros would need to evaluate their arguments more than once,
    which could be a problem if the arguments have side effects. A
    Google Code Search for "PyList_GET_ITEM([^)]+(" found only a handful
    of cases where this occurs, so the impact appears to be low.

    The few extension modules that use list's undocumented structure
    directly, instead of using the API, would break. The core code
    itself uses the accessor macros fairly consistently and should be
    easy to port.

2.  Deprecate the existing list type, but continue to include it.
    Extension modules wishing to use the new BList type must do so
    explicitly. The BList C interface can be changed to match the
    existing PyList interface so that a simple search-replace will be
    sufficient for 99% of module writers.

    Existing modules would continue to compile and work without change,
    but they would need to make a deliberate (but small) effort to
    migrate to the BList.

    The downside of this approach is that mixing modules that use BLists
    and array-based lists might lead to slow down if conversions are
    frequently necessary.

Reference Implementation

A reference implementations of the BList is available for CPython at[3].

The source package also includes a pure Python implementation,
originally developed as a prototype for the CPython version. Naturally,
the pure Python version is rather slow and the asymptotic improvements
don't win out until the list is quite large.

When compiled with Py_DEBUG, the C implementation checks the BList
invariants when entering and exiting most functions.

An extensive set of test cases is also included in the source package.
The test cases include the existing Python sequence and list test cases
as a subset. When the interpreter is built with Py_DEBUG, the test cases
also check for reference leaks.

Porting to Other Python Variants

If the BList is added to the collections module, other Python variants
can support it in one of three ways:

1.  Make blist an alias for list. The asymptotic performance won't be as
    good, but it'll work.
2.  Use the pure Python reference implementation. The performance for
    small lists won't be as good, but it'll work.
3.  Port the reference implementation.

Discussion

This proposal has been discussed briefly on the Python-3000 mailing
list[4]. Although a number of people favored the proposal, there were
also some objections. Below summarizes the pros and cons as observed by
posters to the thread.

General comments:

-   Pro: Will outperform the array-based list in most cases
-   Pro: "I've implemented variants of this ... a few different times"
-   Con: Desirability and performance in actual applications is unproven

Comments on adding BList to the collections module:

-   Pro: Matching the list-API reduces the learning curve to near-zero
-   Pro: Useful for intermediate-level users; won't get in the way of
    beginners
-   Con: Proliferation of data types makes the choices for developers
    harder.

Comments on replacing the array-based list with the BList:

-   Con: Impact on extension modules (addressed in Backwards
    Compatibility)
-   Con: The use cases where BLists are slower are important (see Use
    Case Trade-Offs for how these might be addressed).
-   Con: The array-based list code is simple and easy to maintain

To assess the desirability and performance in actual applications,
Raymond Hettinger suggested releasing the BList as an extension module
(now available at[5]). If it proves useful, he felt it would be a strong
candidate for inclusion in 2.6 as part of the collections module. If
widely popular, then it could be considered for replacing the
array-based list, but not otherwise.

Guido van Rossum commented that he opposed the proliferation of data
types, but favored replacing the array-based list if backwards
compatibility could be addressed and the BList's performance was
uniformly better.

On-going Tasks

-   Reduce the memory footprint of small lists
-   Implement TimSort for BLists, so that best-case sorting is O(n)
    instead of O(log n).
-   Implement __reversed__
-   Cache a pointer in the root to the rightmost leaf, to make LIFO
    operation O(n) time.

References

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

[1] Raymond Hettinger's feedback on python-3000:
https://mail.python.org/pipermail/python-3000/2007-May/007491.html

[2] Empirical performance comparison between Python's array-based list
and the blist: http://stutzbachenterprises.com/blist/

[3] Reference Implementations for C and Python:
http://www.python.org/pypi/blist/

[4] Discussion on python-3000 starting at post:
https://mail.python.org/pipermail/python-3000/2007-April/006757.html

[5] Reference Implementations for C and Python:
http://www.python.org/pypi/blist/