PEP: 307 Title: Extensions to the pickle protocol Version: $Revision$
Last-Modified: $Date$ Author: Guido van Rossum, Tim Peters Status: Final
Type: Standards Track Content-Type: text/x-rst Created: 31-Jan-2003
Python-Version: 2.3 Post-History: 07-Feb-2003

Introduction

Pickling new-style objects in Python 2.2 is done somewhat clumsily and
causes pickle size to bloat compared to classic class instances. This
PEP documents a new pickle protocol in Python 2.3 that takes care of
this and many other pickle issues.

There are two sides to specifying a new pickle protocol: the byte stream
constituting pickled data must be specified, and the interface between
objects and the pickling and unpickling engines must be specified. This
PEP focuses on API issues, although it may occasionally touch on byte
stream format details to motivate a choice. The pickle byte stream
format is documented formally by the standard library module
pickletools.py (already checked into CVS for Python 2.3).

This PEP attempts to fully document the interface between pickled
objects and the pickling process, highlighting additions by specifying
"new in this PEP". (The interface to invoke pickling or unpickling is
not covered fully, except for the changes to the API for specifying the
pickling protocol to picklers.)

Motivation

Pickling new-style objects causes serious pickle bloat. For example:

    class C(object): # Omit "(object)" for classic class
        pass
    x = C()
    x.foo = 42
    print len(pickle.dumps(x, 1))

The binary pickle for the classic object consumed 33 bytes, and for the
new-style object 86 bytes.

The reasons for the bloat are complex, but are mostly caused by the fact
that new-style objects use __reduce__ in order to be picklable at all.
After ample consideration we've concluded that the only way to reduce
pickle sizes for new-style objects is to add new opcodes to the pickle
protocol. The net result is that with the new protocol, the pickle size
in the above example is 35 (two extra bytes are used at the start to
indicate the protocol version, although this isn't strictly necessary).

Protocol versions

Previously, pickling (but not unpickling) distinguished between text
mode and binary mode. By design, binary mode is a superset of text mode,
and unpicklers don't need to know in advance whether an incoming pickle
uses text mode or binary mode. The virtual machine used for unpickling
is the same regardless of the mode; certain opcodes simply aren't used
in text mode.

Retroactively, text mode is now called protocol 0, and binary mode
protocol 1. The new protocol is called protocol 2. In the tradition of
pickling protocols, protocol 2 is a superset of protocol 1. But just so
that future pickling protocols aren't required to be supersets of the
oldest protocols, a new opcode is inserted at the start of a protocol 2
pickle indicating that it is using protocol 2. To date, each release of
Python has been able to read pickles written by all previous releases.
Of course pickles written under protocol N can't be read by versions of
Python earlier than the one that introduced protocol N.

Several functions, methods and constructors used for pickling used to
take a positional argument named 'bin' which was a flag, defaulting to
0, indicating binary mode. This argument is renamed to 'protocol' and
now gives the protocol number, still defaulting to 0.

It so happens that passing 2 for the 'bin' argument in previous Python
versions had the same effect as passing 1. Nevertheless, a special case
is added here: passing a negative number selects the highest protocol
version supported by a particular implementation. This works in previous
Python versions, too, and so can be used to select the highest protocol
available in a way that's both backward and forward compatible. In
addition, a new module constant HIGHEST_PROTOCOL is supplied by both
pickle and cPickle, equal to the highest protocol number the module can
read. This is cleaner than passing -1, but cannot be used before Python
2.3.

The pickle.py module has supported passing the 'bin' value as a keyword
argument rather than a positional argument. (This is not recommended,
since cPickle only accepts positional arguments, but it works...)
Passing 'bin' as a keyword argument is deprecated, and a
PendingDeprecationWarning is issued in this case. You have to invoke the
Python interpreter with -Wa or a variation on that to see
PendingDeprecationWarning messages. In Python 2.4, the warning class may
be upgraded to DeprecationWarning.

Security issues

In previous versions of Python, unpickling would do a "safety check" on
certain operations, refusing to call functions or constructors that
weren't marked as "safe for unpickling" by either having an attribute
__safe_for_unpickling__ set to 1, or by being registered in a global
registry, copy_reg.safe_constructors.

This feature gives a false sense of security: nobody has ever done the
necessary, extensive, code audit to prove that unpickling untrusted
pickles cannot invoke unwanted code, and in fact bugs in the Python 2.2
pickle.py module make it easy to circumvent these security measures.

We firmly believe that, on the Internet, it is better to know that you
are using an insecure protocol than to trust a protocol to be secure
whose implementation hasn't been thoroughly checked. Even high quality
implementations of widely used protocols are routinely found flawed;
Python's pickle implementation simply cannot make such guarantees
without a much larger time investment. Therefore, as of Python 2.3, all
safety checks on unpickling are officially removed, and replaced with
this warning:

Warning

Do not unpickle data received from an untrusted or unauthenticated
source.

The same warning applies to previous Python versions, despite the
presence of safety checks there.

Extended __reduce__ API

There are several APIs that a class can use to control pickling. Perhaps
the most popular of these are __getstate__ and __setstate__; but the
most powerful one is __reduce__. (There's also __getinitargs__, and
we're adding __getnewargs__ below.)

There are several ways to provide __reduce__ functionality: a class can
implement a __reduce__ method or a __reduce_ex__ method (see next
section), or a reduce function can be declared in copy_reg
(copy_reg.dispatch_table maps classes to functions). The return values
are interpreted exactly the same, though, and we'll refer to these
collectively as __reduce__.

Important: pickling of classic class instances does not look for a
__reduce__ or __reduce_ex__ method or a reduce function in the copy_reg
dispatch table, so that a classic class cannot provide __reduce__
functionality in the sense intended here. A classic class must use
__getinitargs__ and/or __getstate__ to customize pickling. These are
described below.

__reduce__ must return either a string or a tuple. If it returns a
string, this is an object whose state is not to be pickled, but instead
a reference to an equivalent object referenced by name. Surprisingly,
the string returned by __reduce__ should be the object's local name
(relative to its module); the pickle module searches the module
namespace to determine the object's module.

The rest of this section is concerned with the tuple returned by
__reduce__. It is a variable size tuple, of length 2 through 5. The
first two items (function and arguments) are required. The remaining
items are optional and may be left off from the end; giving None for the
value of an optional item acts the same as leaving it off. The last two
items are new in this PEP. The items are, in order:

+----------+-----------------------------------------------------------+
| function | Required.                                                 |
|          |                                                           |
|          | A callable object (not necessarily a function) called to  |
|          | create the initial version of the object; state may be    |
|          | added to the object later to fully reconstruct the        |
|          | pickled state. This function must itself be picklable.    |
|          | See the section about __newobj__ for a special case (new  |
|          | in this PEP) here.                                        |
+----------+-----------------------------------------------------------+
| a        | Required.                                                 |
| rguments |                                                           |
|          | A tuple giving the argument list for the function. As a   |
|          | special case, designed for Zope 2's ExtensionClass, this  |
|          | may be None; in that case, function should be a class or  |
|          | type, and function.__basicnew__() is called to create the |
|          | initial version of the object. This exception is          |
|          | deprecated.                                               |
+----------+-----------------------------------------------------------+

Unpickling invokes function(*arguments) to create an initial object,
called obj below. If the remaining items are left off, that's the end of
unpickling for this object and obj is the result. Else obj is modified
at unpickling time by each item specified, as follows.

+----------+-----------------------------------------------------------+
| state    | Optional.                                                 |
|          |                                                           |
|          | Additional state. If this is not None, the state is       |
|          | pickled, and obj.__setstate__(state) will be called when  |
|          | unpickling. If no __setstate__ method is defined, a       |
|          | default implementation is provided, which assumes that    |
|          | state is a dictionary mapping instance variable names to  |
|          | their values. The default implementation calls :          |
|          |                                                           |
|          |     obj.__dict__.update(state)                            |
|          |                                                           |
|          | or, if the update() call fails, :                         |
|          |                                                           |
|          |     for k, v in state.items():                            |
|          |         setattr(obj, k, v)                                |
+----------+-----------------------------------------------------------+
| l        | Optional, and new in this PEP.                            |
| istitems |                                                           |
|          | If this is not None, it should be an iterator (not a      |
|          | sequence!) yielding successive list items. These list     |
|          | items will be pickled, and appended to the object using   |
|          | either obj.append(item) or obj.extend(list_of_items).     |
|          | This is primarily used for list subclasses, but may be    |
|          | used by other classes as long as they have append() and   |
|          | extend() methods with the appropriate signature. (Whether |
|          | append() or extend() is used depends on which pickle      |
|          | protocol version is used as well as the number of items   |
|          | to append, so both must be supported.)                    |
+----------+-----------------------------------------------------------+
| d        | Optional, and new in this PEP.                            |
| ictitems |                                                           |
|          | If this is not None, it should be an iterator (not a      |
|          | sequence!) yielding successive dictionary items, which    |
|          | should be tuples of the form (key, value). These items    |
|          | will be pickled, and stored to the object using           |
|          | obj[key] = value. This is primarily used for dict         |
|          | subclasses, but may be used by other classes as long as   |
|          | they implement __setitem__.                               |
+----------+-----------------------------------------------------------+

Note: in Python 2.2 and before, when using cPickle, state would be
pickled if present even if it is None; the only safe way to avoid the
__setstate__ call was to return a two-tuple from __reduce__. (But
pickle.py would not pickle state if it was None.) In Python 2.3,
__setstate__ will never be called at unpickling time when __reduce__
returns a state with value None at pickling time.

A __reduce__ implementation that needs to work both under Python 2.2 and
under Python 2.3 could check the variable pickle.format_version to
determine whether to use the listitems and dictitems features. If this
value is >= "2.0" then they are supported. If not, any list or dict
items should be incorporated somehow in the 'state' return value, and
the __setstate__ method should be prepared to accept list or dict items
as part of the state (how this is done is up to the application).

The __reduce_ex__ API

It is sometimes useful to know the protocol version when implementing
__reduce__. This can be done by implementing a method named
__reduce_ex__ instead of __reduce__. __reduce_ex__, when it exists, is
called in preference over __reduce__ (you may still provide __reduce__
for backwards compatibility). The __reduce_ex__ method will be called
with a single integer argument, the protocol version.

The 'object' class implements both __reduce__ and __reduce_ex__;
however, if a subclass overrides __reduce__ but not __reduce_ex__, the
__reduce_ex__ implementation detects this and calls __reduce__.

Customizing pickling absent a __reduce__ implementation

If no __reduce__ implementation is available for a particular class,
there are three cases that need to be considered separately, because
they are handled differently:

1.  classic class instances, all protocols
2.  new-style class instances, protocols 0 and 1
3.  new-style class instances, protocol 2

Types implemented in C are considered new-style classes. However, except
for the common built-in types, these need to provide a __reduce__
implementation in order to be picklable with protocols 0 or 1. Protocol
2 supports built-in types providing __getnewargs__, __getstate__ and
__setstate__ as well.

Case 1: pickling classic class instances

This case is the same for all protocols, and is unchanged from Python
2.1.

For classic classes, __reduce__ is not used. Instead, classic classes
can customize their pickling by providing methods named __getstate__,
__setstate__ and __getinitargs__. Absent these, a default pickling
strategy for classic class instances is implemented that works as long
as all instance variables are picklable. This default strategy is
documented in terms of default implementations of __getstate__ and
__setstate__.

The primary ways to customize pickling of classic class instances is by
specifying __getstate__ and/or __setstate__ methods. It is fine if a
class implements one of these but not the other, as long as it is
compatible with the default version.

The __getstate__ method

The __getstate__ method should return a picklable value representing the
object's state without referencing the object itself. If no __getstate__
method exists, a default implementation is used that returns
self.__dict__.

The __setstate__ method

The __setstate__ method should take one argument; it will be called with
the value returned by __getstate__ (or its default implementation).

If no __setstate__ method exists, a default implementation is provided
that assumes the state is a dictionary mapping instance variable names
to values. The default implementation tries two things:

-   First, it tries to call self.__dict__.update(state).
-   If the update() call fails with a RuntimeError exception, it calls
    setattr(self, key, value) for each (key, value) pair in the state
    dictionary. This only happens when unpickling in restricted
    execution mode (see the rexec standard library module).

The __getinitargs__ method

The __setstate__ method (or its default implementation) requires that a
new object already exists so that its __setstate__ method can be called.
The point is to create a new object that isn't fully initialized; in
particular, the class's __init__ method should not be called if
possible.

These are the possibilities:

-   Normally, the following trick is used: create an instance of a
    trivial classic class (one without any methods or instance
    variables) and then use __class__ assignment to change its class to
    the desired class. This creates an instance of the desired class
    with an empty __dict__ whose __init__ has not been called.
-   However, if the class has a method named __getinitargs__, the above
    trick is not used, and a class instance is created by using the
    tuple returned by __getinitargs__ as an argument list to the class
    constructor. This is done even if __getinitargs__ returns an empty
    tuple --- a __getinitargs__ method that returns () is not equivalent
    to not having __getinitargs__ at all. __getinitargs__ must return a
    tuple.
-   In restricted execution mode, the trick from the first bullet
    doesn't work; in this case, the class constructor is called with an
    empty argument list if no __getinitargs__ method exists. This means
    that in order for a classic class to be unpicklable in restricted
    execution mode, it must either implement __getinitargs__ or its
    constructor (i.e., its __init__ method) must be callable without
    arguments.

Case 2: pickling new-style class instances using protocols 0 or 1

This case is unchanged from Python 2.2. For better pickling of new-style
class instances when backwards compatibility is not an issue, protocol 2
should be used; see case 3 below.

New-style classes, whether implemented in C or in Python, inherit a
default __reduce__ implementation from the universal base class
'object'.

This default __reduce__ implementation is not used for those built-in
types for which the pickle module has built-in support. Here's a full
list of those types:

-   Concrete built-in types: NoneType, bool, int, float, complex, str,
    unicode, tuple, list, dict. (Complex is supported by virtue of a
    __reduce__ implementation registered in copy_reg.) In Jython,
    PyStringMap is also included in this list.
-   Classic instances.
-   Classic class objects, Python function objects, built-in function
    and method objects, and new-style type objects (== new-style class
    objects). These are pickled by name, not by value: at unpickling
    time, a reference to an object with the same name (the fully
    qualified module name plus the variable name in that module) is
    substituted.

The default __reduce__ implementation will fail at pickling time for
built-in types not mentioned above, and for new-style classes
implemented in C: if they want to be picklable, they must supply a
custom __reduce__ implementation under protocols 0 and 1.

For new-style classes implemented in Python, the default __reduce__
implementation (copy_reg._reduce) works as follows:

Let D be the class on the object to be pickled. First, find the nearest
base class that is implemented in C (either as a built-in type or as a
type defined by an extension class). Call this base class B, and the
class of the object to be pickled D. Unless B is the class 'object',
instances of class B must be picklable, either by having built-in
support (as defined in the above three bullet points), or by having a
non-default __reduce__ implementation. B must not be the same class as D
(if it were, it would mean that D is not implemented in Python).

The callable produced by the default __reduce__ is
copy_reg._reconstructor, and its arguments tuple is (D, B, basestate),
where basestate is None if B is the builtin object class, and basestate
is :

    basestate = B(obj)

if B is not the builtin object class. This is geared toward pickling
subclasses of builtin types, where, for example,
list(some_list_subclass_instance) produces "the list part" of the list
subclass instance.

The object is recreated at unpickling time by copy_reg._reconstructor,
like so:

    obj = B.__new__(D, basestate)
    B.__init__(obj, basestate)

Objects using the default __reduce__ implementation can customize it by
defining __getstate__ and/or __setstate__ methods. These work almost the
same as described for classic classes above, except that if __getstate__
returns an object (of any type) whose value is considered false (e.g.
None, or a number that is zero, or an empty sequence or mapping), this
state is not pickled and __setstate__ will not be called at all. If
__getstate__ exists and returns a true value, that value becomes the
third element of the tuple returned by the default __reduce__, and at
unpickling time the value is passed to __setstate__. If __getstate__
does not exist, but obj.__dict__ exists, then obj.__dict__ becomes the
third element of the tuple returned by __reduce__, and again at
unpickling time the value is passed to obj.__setstate__. The default
__setstate__ is the same as that for classic classes, described above.

Note that this strategy ignores slots. Instances of new-style classes
that have slots but no __getstate__ method cannot be pickled by
protocols 0 and 1; the code explicitly checks for this condition.

Note that pickling new-style class instances ignores __getinitargs__ if
it exists (and under all protocols). __getinitargs__ is useful only for
classic classes.

Case 3: pickling new-style class instances using protocol 2

Under protocol 2, the default __reduce__ implementation inherited from
the 'object' base class is ignored. Instead, a different default
implementation is used, which allows more efficient pickling of
new-style class instances than possible with protocols 0 or 1, at the
cost of backward incompatibility with Python 2.2 (meaning no more than
that a protocol 2 pickle cannot be unpickled before Python 2.3).

The customization uses three special methods: __getstate__, __setstate__
and __getnewargs__ (note that __getinitargs__ is again ignored). It is
fine if a class implements one or more but not all of these, as long as
it is compatible with the default implementations.

The __getstate__ method

The __getstate__ method should return a picklable value representing the
object's state without referencing the object itself. If no __getstate__
method exists, a default implementation is used which is described
below.

There's a subtle difference between classic and new-style classes here:
if a classic class's __getstate__ returns None, self.__setstate__(None)
will be called as part of unpickling. But if a new-style class's
__getstate__ returns None, its __setstate__ won't be called at all as
part of unpickling.

If no __getstate__ method exists, a default state is computed. There are
several cases:

-   For a new-style class that has no instance __dict__ and no
    __slots__, the default state is None.
-   For a new-style class that has an instance __dict__ and no
    __slots__, the default state is self.__dict__.
-   For a new-style class that has an instance __dict__ and __slots__,
    the default state is a tuple consisting of two dictionaries:
    self.__dict__, and a dictionary mapping slot names to slot values.
    Only slots that have a value are included in the latter.
-   For a new-style class that has __slots__ and no instance __dict__,
    the default state is a tuple whose first item is None and whose
    second item is a dictionary mapping slot names to slot values
    described in the previous bullet.

The __setstate__ method

The __setstate__ method should take one argument; it will be called with
the value returned by __getstate__ or with the default state described
above if no __getstate__ method is defined.

If no __setstate__ method exists, a default implementation is provided
that can handle the state returned by the default __getstate__,
described above.

The __getnewargs__ method

Like for classic classes, the __setstate__ method (or its default
implementation) requires that a new object already exists so that its
__setstate__ method can be called.

In protocol 2, a new pickling opcode is used that causes a new object to
be created as follows:

    obj = C.__new__(C, *args)

where C is the class of the pickled object, and args is either the empty
tuple, or the tuple returned by the __getnewargs__ method, if defined.
__getnewargs__ must return a tuple. The absence of a __getnewargs__
method is equivalent to the existence of one that returns ().

The __newobj__ unpickling function

When the unpickling function returned by __reduce__ (the first item of
the returned tuple) has the name __newobj__, something special happens
for pickle protocol 2. An unpickling function named __newobj__ is
assumed to have the following semantics:

    def __newobj__(cls, *args):
        return cls.__new__(cls, *args)

Pickle protocol 2 special-cases an unpickling function with this name,
and emits a pickling opcode that, given 'cls' and 'args', will return
cls.__new__(cls, *args) without also pickling a reference to __newobj__
(this is the same pickling opcode used by protocol 2 for a new-style
class instance when no __reduce__ implementation exists). This is the
main reason why protocol 2 pickles are much smaller than classic
pickles. Of course, the pickling code cannot verify that a function
named __newobj__ actually has the expected semantics. If you use an
unpickling function named __newobj__ that returns something different,
you deserve what you get.

It is safe to use this feature under Python 2.2; there's nothing in the
recommended implementation of __newobj__ that depends on Python 2.3.

The extension registry

Protocol 2 supports a new mechanism to reduce the size of pickles.

When class instances (classic or new-style) are pickled, the full name
of the class (module name including package name, and class name) is
included in the pickle. Especially for applications that generate many
small pickles, this is a lot of overhead that has to be repeated in each
pickle. For large pickles, when using protocol 1, repeated references to
the same class name are compressed using the "memo" feature; but each
class name must be spelled in full at least once per pickle, and this
causes a lot of overhead for small pickles.

The extension registry allows one to represent the most frequently used
names by small integers, which are pickled very efficiently: an
extension code in the range 1--255 requires only two bytes including the
opcode, one in the range 256--65535 requires only three bytes including
the opcode.

One of the design goals of the pickle protocol is to make pickles
"context-free": as long as you have installed the modules containing the
classes referenced by a pickle, you can unpickle it, without needing to
import any of those classes ahead of time.

Unbridled use of extension codes could jeopardize this desirable
property of pickles. Therefore, the main use of extension codes is
reserved for a set of codes to be standardized by some standard-setting
body. This being Python, the standard-setting body is the PSF. From time
to time, the PSF will decide on a table mapping extension codes to class
names (or occasionally names of other global objects; functions are also
eligible). This table will be incorporated in the next Python
release(s).

However, for some applications, like Zope, context-free pickles are not
a requirement, and waiting for the PSF to standardize some codes may not
be practical. Two solutions are offered for such applications.

First, a few ranges of extension codes are reserved for private use. Any
application can register codes in these ranges. Two applications
exchanging pickles using codes in these ranges need to have some
out-of-band mechanism to agree on the mapping between extension codes
and names.

Second, some large Python projects (e.g. Zope) can be assigned a range
of extension codes outside the "private use" range that they can assign
as they see fit.

The extension registry is defined as a mapping between extension codes
and names. When an extension code is unpickled, it ends up producing an
object, but this object is gotten by interpreting the name as a module
name followed by a class (or function) name. The mapping from names to
objects is cached. It is quite possible that certain names cannot be
imported; that should not be a problem as long as no pickle containing a
reference to such names has to be unpickled. (The same issue already
exists for direct references to such names in pickles that use protocols
0 or 1.)

Here is the proposed initial assignment of extension code ranges:

+-------+-------+-------+---------------------------------------------------+
| First | Last  | Count | Purpose                                           |
+=======+=======+=======+===================================================+
|   0   |   0   |   1   | Reserved --- will never be used                   |
+-------+-------+-------+---------------------------------------------------+
|   1   |   127 |   127 | Reserved for Python standard library              |
+-------+-------+-------+---------------------------------------------------+
| 128   |   191 |   64  | Reserved for Zope                                 |
+-------+-------+-------+---------------------------------------------------+
| 192   |   239 |   48  | Reserved for 3rd parties                          |
+-------+-------+-------+---------------------------------------------------+
| 240   |   255 |   16  | Reserved for private use (will never be assigned) |
+-------+-------+-------+---------------------------------------------------+
| 256   | MAX   | MAX   | Reserved for future assignment                    |
+-------+-------+-------+---------------------------------------------------+

MAX stands for 2147483647, or 2**31-1. This is a hard limitation of the
protocol as currently defined.

At the moment, no specific extension codes have been assigned yet.

Extension registry API

The extension registry is maintained as private global variables in the
copy_reg module. The following three functions are defined in this
module to manipulate the registry:

add_extension(module, name, code)

    Register an extension code. The module and name arguments must be
    strings; code must be an int in the inclusive range 1 through MAX.
    This must either register a new (module, name) pair to a new code,
    or be a redundant repeat of a previous call that was not canceled by
    a remove_extension() call; a (module, name) pair may not be mapped
    to more than one code, nor may a code be mapped to more than one
    (module, name) pair.

remove_extension(module, name, code)

    Arguments are as for add_extension(). Remove a previously registered
    mapping between (module, name) and code.

clear_extension_cache()

    The implementation of extension codes may use a cache to speed up
    loading objects that are named frequently. This cache can be emptied
    (removing references to cached objects) by calling this method.

Note that the API does not enforce the standard range assignments. It is
up to applications to respect these.

The copy module

Traditionally, the copy module has supported an extended subset of the
pickling APIs for customizing the copy() and deepcopy() operations.

In particular, besides checking for a __copy__ or __deepcopy__ method,
copy() and deepcopy() have always looked for __reduce__, and for classic
classes, have looked for __getinitargs__, __getstate__ and __setstate__.

In Python 2.2, the default __reduce__ inherited from 'object' made
copying simple new-style classes possible, but slots and various other
special cases were not covered.

In Python 2.3, several changes are made to the copy module:

-   __reduce_ex__ is supported (and always called with 2 as the protocol
    version argument).
-   The four- and five-argument return values of __reduce__ are
    supported.
-   Before looking for a __reduce__ method, the copy_reg.dispatch_table
    is consulted, just like for pickling.
-   When the __reduce__ method is inherited from object, it is
    (unconditionally) replaced by a better one that uses the same APIs
    as pickle protocol 2: __getnewargs__, __getstate__, and
    __setstate__, handling list and dict subclasses, and handling slots.

As a consequence of the latter change, certain new-style classes that
were copyable under Python 2.2 are not copyable under Python 2.3. (These
classes are also not picklable using pickle protocol 2.) A minimal
example of such a class:

    class C(object):
        def __new__(cls, a):
            return object.__new__(cls)

The problem only occurs when __new__ is overridden and has at least one
mandatory argument in addition to the class argument.

To fix this, a __getnewargs__ method should be added that returns the
appropriate argument tuple (excluding the class).

Pickling Python longs

Pickling and unpickling Python longs takes time quadratic in the number
of digits, in protocols 0 and 1. Under protocol 2, new opcodes support
linear-time pickling and unpickling of longs.

Pickling bools

Protocol 2 introduces new opcodes for pickling True and False directly.
Under protocols 0 and 1, bools are pickled as integers, using a trick in
the representation of the integer in the pickle so that an unpickler can
recognize that a bool was intended. That trick consumed 4 bytes per bool
pickled. The new bool opcodes consume 1 byte per bool.

Pickling small tuples

Protocol 2 introduces new opcodes for more-compact pickling of tuples of
lengths 1, 2 and 3. Protocol 1 previously introduced an opcode for
more-compact pickling of empty tuples.

Protocol identification

Protocol 2 introduces a new opcode, with which all protocol 2 pickles
begin, identifying that the pickle is protocol 2. Attempting to unpickle
a protocol 2 pickle under older versions of Python will therefore raise
an "unknown opcode" exception immediately.

Pickling of large lists and dicts

Protocol 1 pickles large lists and dicts "in one piece", which minimizes
pickle size, but requires that unpickling create a temp object as large
as the object being unpickled. Part of the protocol 2 changes break
large lists and dicts into pieces of no more than 1000 elements each, so
that unpickling needn't create a temp object larger than needed to hold
1000 elements. This isn't part of protocol 2, however: the opcodes
produced are still part of protocol 1. __reduce__ implementations that
return the optional new listitems or dictitems iterators also benefit
from this unpickling temp-space optimization.

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 End: