PEP: 205 Title: Weak References Version: $Revision$ Last-Modified:
$Date$ Author: Fred L. Drake, Jr. <fred@fdrake.net> Status: Final Type:
Standards Track Content-Type: text/x-rst Created: 14-Jul-2000
Python-Version: 2.1 Post-History: 11-Jan-2001

Motivation

There are two basic applications for weak references which have been
noted by Python programmers: object caches and reduction of pain from
circular references.

Caches (weak dictionaries)

There is a need to allow objects to be maintained that represent
external state, mapping a single instance to the external reality, where
allowing multiple instances to be mapped to the same external resource
would create unnecessary difficulty maintaining synchronization among
instances. In these cases, a common idiom is to support a cache of
instances; a factory function is used to return either a new or existing
instance.

The difficulty in this approach is that one of two things must be
tolerated: either the cache grows without bound, or there needs to be
explicit management of the cache elsewhere in the application. The later
can be very tedious and leads to more code than is really necessary to
solve the problem at hand, and the former can be unacceptable for
long-running processes or even relatively short processes with
substantial memory requirements.

-   External objects that need to be represented by a single instance,
    no matter how many internal users there are. This can be useful for
    representing files that need to be written back to disk in whole
    rather than locked & modified for every use.
-   Objects that are expensive to create, but may be needed by multiple
    internal consumers. Similar to the first case, but not necessarily
    bound to external resources, and possibly not an issue for shared
    state. Weak references are only useful in this case if there is some
    flavor of "soft" references or if there is a high likelihood that
    users of individual objects will overlap in lifespan.

Circular references

-   DOMs require a huge amount of circular (to parent & document nodes)
    references, but these could be eliminated using a weak dictionary
    mapping from each node to its parent. This might be especially
    useful in the context of something like xml.dom.pulldom, allowing
    the .unlink() operation to become a no-op.

This proposal is divided into the following sections:

-   Proposed Solution
-   Implementation Strategy
-   Possible Applications
-   Previous Weak Reference Work in Python
-   Weak References in Java

The full text of one early proposal is included as an appendix since it
does not appear to be available on the net.

Aspects of the Solution Space

There are two distinct aspects to the weak references problem:

-   Invalidation of weak references
-   Presentation of weak references to Python code

Invalidation

Past approaches to weak reference invalidation have often hinged on
storing a strong reference and being able to examine all the instances
of weak reference objects, and invalidating them when the reference
count of their referent goes to one (indicating that the reference
stored by the weak reference is the last remaining reference). This has
the advantage that the memory management machinery in Python need not
change, and that any type can be weakly referenced.

The disadvantage of this approach to invalidation is that it assumes
that the management of the weak references is called sufficiently
frequently that weakly-referenced objects are noticed within a
reasonably short time frame; since this means a scan over some data
structure to invalidate references, an operation which is O(N) on the
number of weakly referenced objects, this is not effectively amortized
for any single object which is weakly referenced. This also assumes that
the application is calling into code which handles weakly-referenced
objects with some frequency, which makes weak-references less attractive
for library code.

An alternate approach to invalidation is that the de-allocation code to
be aware of the possibility of weak references and make a specific call
into the weak-reference management code to all invalidation whenever an
object is deallocated. This requires a change in the tp_dealloc handler
for weakly-referencable objects; an additional call is needed at the
"top" of the handler for objects which support weak-referencing, and an
efficient way to map from an object to a chain of weak references for
that object is needed as well.

Presentation

Two ways that weak references are presented to the Python layer have
been as explicit reference objects upon which some operation is required
in order to retrieve a usable reference to the underlying object, and
proxy objects which masquerade as the original objects as much as
possible.

Reference objects are easy to work with when some additional layer of
object management is being added in Python; references can be checked
for liveness explicitly, without having to invoke operations on the
referents and catching some special exception raised when an invalid
weak reference is used.

However, a number of users favor the proxy approach simply because the
weak reference looks so much like the original object.

Proposed Solution

Weak references should be able to point to any Python object that may
have substantial memory size (directly or indirectly), or hold
references to external resources (database connections, open files,
etc.).

A new module, weakref, will contain new functions used to create weak
references. weakref.ref() will create a "weak reference object" and
optionally attach a callback which will be called when the object is
about to be finalized. weakref.mapping() will create a "weak
dictionary". A third function, weakref.proxy(), will create a proxy
object that behaves somewhat like the original object.

A weak reference object will allow access to the referenced object if it
hasn't been collected and to determine if the object still exists in
memory. Retrieving the referent is done by calling the reference object.
If the referent is no longer alive, this will return None instead.

A weak dictionary maps arbitrary keys to values, but does not own a
reference to the values. When the values are finalized, the (key, value)
pairs for which it is a value are removed from all the mappings
containing such pairs. Like dictionaries, weak dictionaries are not
hashable.

Proxy objects are weak references that attempt to behave like the object
they proxy, as much as they can. Regardless of the underlying type,
proxies are not hashable since their ability to act as a weak reference
relies on a fundamental mutability that will cause failures when used as
dictionary keys -- even if the proper hash value is computed before the
referent dies, the resulting proxy cannot be used as a dictionary key
since it cannot be compared once the referent has expired, and
comparability is necessary for dictionary keys. Operations on proxy
objects after the referent dies cause weakref.ReferenceError to be
raised in most cases. "is" comparisons, type(), and id() will continue
to work, but always refer to the proxy and not the referent.

The callbacks registered with weak references must accept a single
parameter, which will be the weak reference or proxy object itself. The
object cannot be accessed or resurrected in the callback.

Implementation Strategy

The implementation of weak references will include a list of reference
containers that must be cleared for each weakly-referencable object. If
the reference is from a weak dictionary, the dictionary entry is cleared
first. Then, any associated callback is called with the object passed as
a parameter. Once all callbacks have been called, the object is
finalized and deallocated.

Many built-in types will participate in the weak-reference management,
and any extension type can elect to do so. The type structure will
contain an additional field which provides an offset into the instance
structure which contains a list of weak reference structures. If the
value of the field is <= 0, the object does not participate. In this
case, weakref.ref(), <weakdict>.__setitem__() and .setdefault(), and
item assignment will raise TypeError. If the value of the field is > 0,
a new weak reference can be generated and added to the list.

This approach is taken to allow arbitrary extension types to
participate, without taking a memory hit for numbers or other small
types.

Standard types which support weak references include instances,
functions, and bound & unbound methods. With the addition of class types
("new-style classes") in Python 2.2, types grew support for weak
references. Instances of class types are weakly referencable if they
have a base type which is weakly referencable, the class not specify
__slots__, or a slot is named __weakref__. Generators also support weak
references.

Possible Applications

PyGTK+ bindings?

Tkinter -- could avoid circular references by using weak references from
widgets to their parents. Objects won't be discarded any sooner in the
typical case, but there won't be so much dependence on the programmer
calling .destroy() before releasing a reference. This would mostly
benefit long-running applications.

DOM trees.

Previous Weak Reference Work in Python

Dianne Hackborn has proposed something called "virtual references".
'vref' objects are very similar to java.lang.ref.WeakReference objects,
except there is no equivalent to the invalidation queues. Implementing a
"weak dictionary" would be just as difficult as using only weak
references (without the invalidation queue) in Java. Information on this
has disappeared from the Web, but is included below as an Appendix.

Marc-André Lemburg's mx.Proxy package:

  http://www.lemburg.com/files/python/mxProxy.html

The weakdict module by Dieter Maurer is implemented in C and Python. It
appears that the Web pages have not been updated since Python 1.5.2a, so
I'm not yet sure if the implementation is compatible with Python 2.0.

  http://www.handshake.de/~dieter/weakdict.html

PyWeakReference by Alex Shindich:

  http://sourceforge.net/projects/pyweakreference/

Eric Tiedemann has a weak dictionary implementation:

  http://www.hyperreal.org/~est/python/weak/

Weak References in Java

http://java.sun.com/j2se/1.3/docs/api/java/lang/ref/package-summary.html

Java provides three forms of weak references, and one interesting helper
class. The three forms are called "weak", "soft", and "phantom"
references. The relevant classes are defined in the java.lang.ref
package.

For each of the reference types, there is an option to add the reference
to a queue when it is invalidated by the memory allocator. The primary
purpose of this facility seems to be that it allows larger structures to
be composed to incorporate weak-reference semantics without having to
impose substantial additional locking requirements. For instance, it
would not be difficult to use this facility to create a "weak" hash
table which removes keys and referents when a reference is no longer
used elsewhere. Using weak references for the objects without some sort
of notification queue for invalidations leads to much more tedious
implementation of the various operations required on hash tables. This
can be a performance bottleneck if deallocations of the stored objects
are infrequent.

Java's "weak" references are most like Dianne Hackborn's old vref
proposal: a reference object refers to a single Python object, but does
not own a reference to that object. When that object is deallocated, the
reference object is invalidated. Users of the reference object can
easily determine that the reference has been invalidated, or a
NullObjectDereferenceError can be raised when an attempt is made to use
the referred-to object.

The "soft" references are similar, but are not invalidated as soon as
all other references to the referred-to object have been released. The
"soft" reference does own a reference, but allows the memory allocator
to free the referent if the memory is needed elsewhere. It is not clear
whether this means soft references are released before the malloc()
implementation calls sbrk() or its equivalent, or if soft references are
only cleared when malloc() returns NULL.

"Phantom" references are a little different; unlike weak and soft
references, the referent is not cleared when the reference is added to
its queue. When all phantom references for an object are dequeued, the
object is cleared. This can be used to keep an object alive until some
additional cleanup is performed which needs to happen before the objects
.finalize() method is called.

Unlike the other two reference types, "phantom" references must be
associated with an invalidation queue.

Appendix -- Dianne Hackborn's vref proposal (1995)

[This has been indented and paragraphs reflowed, but there have be no
content changes. --Fred]

Proposal: Virtual References

In an attempt to partly address the recurring discussion concerning
reference counting vs. garbage collection, I would like to propose an
extension to Python which should help in the creation of "well
structured" cyclic graphs. In particular, it should allow at least trees
with parent back-pointers and doubly-linked lists to be created without
worry about cycles.

The basic mechanism I'd like to propose is that of a "virtual
reference," or a "vref" from here on out. A vref is essentially a handle
on an object that does not increment the object's reference count. This
means that holding a vref on an object will not keep the object from
being destroyed. This would allow the Python programmer, for example, to
create the aforementioned tree structure, which is automatically
destroyed when it is no longer in use -- by making all of the parent
back-references into vrefs, they no longer create reference cycles which
keep the tree from being destroyed.

In order to implement this mechanism, the Python core must ensure that
no -real- pointers are ever left referencing objects that no longer
exist. The implementation I would like to propose involves two basic
additions to the current Python system:

1.  A new "vref" type, through which the Python programmer creates and
    manipulates virtual references. Internally, it is basically a
    C-level Python object with a pointer to the Python object it is a
    reference to. Unlike all other Python code, however, it does not
    change the reference count of this object. In addition, it includes
    two pointers to implement a doubly-linked list, which is used below.
2.  The addition of a new field to the basic Python object
    [PyObject_Head in object.h], which is either NULL, or points to the
    head of a list of all vref objects that reference it. When a vref
    object attaches itself to another object, it adds itself to this
    linked list. Then, if an object with any vrefs on it is deallocated,
    it may walk this list and ensure that all of the vrefs on it point
    to some safe value, e.g. Nothing.

This implementation should hopefully have a minimal impact on the
current Python core -- when no vrefs exist, it should only add one
pointer to all objects, and a check for a NULL pointer every time an
object is deallocated.

Back at the Python language level, I have considered two possible
semantics for the vref object --

Pointer semantics

In this model, a vref behaves essentially like a Python-level pointer;
the Python program must explicitly dereference the vref to manipulate
the actual object it references.

An example vref module using this model could include the function
"new"; When used as 'MyVref = vref.new(MyObject)', it returns a new vref
object such that MyVref.object == MyObject. MyVref.object would then
change to Nothing if MyObject is ever deallocated.

For a concrete example, we may introduce some new C-style syntax:

-   & -- unary operator, creates a vref on an object, same as
    vref.new().
-   * -- unary operator, dereference a vref, same as VrefObject.object.

We can then define:

    1.     type(&MyObject) == vref.VrefType
    2.        *(&MyObject) == MyObject
    3. (*(&MyObject)).attr == MyObject.attr
    4.          &&MyObject == Nothing
    5.           *MyObject -> exception

Rule #4 is subtle, but comes about because we have made a vref to (a
vref with no real references). Thus the outer vref is cleared to Nothing
when the inner one inevitably disappears.

Proxy semantics

In this model, the Python programmer manipulates vref objects just as if
she were manipulating the object it is a reference of. This is
accomplished by implementing the vref so that all operations on it are
redirected to its referenced object. With this model, the dereference
operator (*) no longer makes sense; instead, we have only the reference
operator (&), and define:

    1.  type(&MyObject) == type(MyObject)
    2.        &MyObject == MyObject
    3. (&MyObject).attr == MyObject.attr
    4.       &&MyObject == MyObject

Again, rule #4 is important -- here, the outer vref is in fact a
reference to the original object, and -not- the inner vref. This is
because all operations applied to a vref actually apply to its object,
so that creating a vref of a vref actually results in creating a vref of
the latter's object.

The first, pointer semantics, has the advantage that it would be very
easy to implement; the vref type is extremely simple, requiring at
minimum a single attribute, object, and a function to create a
reference.

However, I really like the proxy semantics. Not only does it put less of
a burden on the Python programmer, but it allows you to do nice things
like use a vref anywhere you would use the actual object. Unfortunately,
it would probably an extreme pain, if not practically impossible, to
implement in the current Python implementation. I do have some thoughts,
though, on how to do this, if it seems interesting; one possibility is
to introduce new type-checking functions which handle the vref. This
would hopefully older C modules which don't expect vrefs to simply
return a type error, until they can be fixed.

Finally, there are some other additional capabilities that this system
could provide. One that seems particularly interesting to me involves
allowing the Python programmer to add "destructor" function to a vref --
this Python function would be called immediately prior to the referenced
object being deallocated, allowing a Python program to invisibly attach
itself to another object and watch for it to disappear. This seems neat,
though I haven't actually come up with any practical uses for it, yet...
:)

-- Dianne

Copyright

This document has been placed in the public domain.