PEP: 227 Title: Statically Nested Scopes Author: Jeremy Hylton
<jeremy@alum.mit.edu> Status: Final Type: Standards Track Content-Type:
text/x-rst Created: 01-Nov-2000 Python-Version: 2.1 Post-History:

Abstract

This PEP describes the addition of statically nested scoping (lexical
scoping) for Python 2.2, and as a source level option for python 2.1. In
addition, Python 2.1 will issue warnings about constructs whose meaning
may change when this feature is enabled.

The old language definition (2.0 and before) defines exactly three
namespaces that are used to resolve names -- the local, global, and
built-in namespaces. The addition of nested scopes allows resolution of
unbound local names in enclosing functions' namespaces.

The most visible consequence of this change is that lambdas (and other
nested functions) can reference variables defined in the surrounding
namespace. Currently, lambdas must often use default arguments to
explicitly creating bindings in the lambda's namespace.

Introduction

This proposal changes the rules for resolving free variables in Python
functions. The new name resolution semantics will take effect with
Python 2.2. These semantics will also be available in Python 2.1 by
adding "from __future__ import nested_scopes" to the top of a module.
(See PEP 236.)

The Python 2.0 definition specifies exactly three namespaces to check
for each name -- the local namespace, the global namespace, and the
builtin namespace. According to this definition, if a function A is
defined within a function B, the names bound in B are not visible in A.
The proposal changes the rules so that names bound in B are visible in A
(unless A contains a name binding that hides the binding in B).

This specification introduces rules for lexical scoping that are common
in Algol-like languages. The combination of lexical scoping and existing
support for first-class functions is reminiscent of Scheme.

The changed scoping rules address two problems -- the limited utility of
lambda expressions (and nested functions in general), and the frequent
confusion of new users familiar with other languages that support nested
lexical scopes, e.g. the inability to define recursive functions except
at the module level.

The lambda expression yields an unnamed function that evaluates a single
expression. It is often used for callback functions. In the example
below (written using the Python 2.0 rules), any name used in the body of
the lambda must be explicitly passed as a default argument to the
lambda.

    from Tkinter import *
    root = Tk()
    Button(root, text="Click here",
           command=lambda root=root: root.test.configure(text="..."))

This approach is cumbersome, particularly when there are several names
used in the body of the lambda. The long list of default arguments
obscures the purpose of the code. The proposed solution, in crude terms,
implements the default argument approach automatically. The "root=root"
argument can be omitted.

The new name resolution semantics will cause some programs to behave
differently than they did under Python 2.0. In some cases, programs will
fail to compile. In other cases, names that were previously resolved
using the global namespace will be resolved using the local namespace of
an enclosing function. In Python 2.1, warnings will be issued for all
statements that will behave differently.

Specification

Python is a statically scoped language with block structure, in the
traditional of Algol. A code block or region, such as a module, class
definition, or function body, is the basic unit of a program.

Names refer to objects. Names are introduced by name binding operations.
Each occurrence of a name in the program text refers to the binding of
that name established in the innermost function block containing the
use.

The name binding operations are argument declaration, assignment, class
and function definition, import statements, for statements, and except
clauses. Each name binding occurs within a block defined by a class or
function definition or at the module level (the top-level code block).

If a name is bound anywhere within a code block, all uses of the name
within the block are treated as references to the current block. (Note:
This can lead to errors when a name is used within a block before it is
bound.)

If the global statement occurs within a block, all uses of the name
specified in the statement refer to the binding of that name in the
top-level namespace. Names are resolved in the top-level namespace by
searching the global namespace, i.e. the namespace of the module
containing the code block, and in the builtin namespace, i.e. the
namespace of the __builtin__ module. The global namespace is searched
first. If the name is not found there, the builtin namespace is
searched. The global statement must precede all uses of the name.

If a name is used within a code block, but it is not bound there and is
not declared global, the use is treated as a reference to the nearest
enclosing function region. (Note: If a region is contained within a
class definition, the name bindings that occur in the class block are
not visible to enclosed functions.)

A class definition is an executable statement that may contain uses and
definitions of names. These references follow the normal rules for name
resolution. The namespace of the class definition becomes the attribute
dictionary of the class.

The following operations are name binding operations. If they occur
within a block, they introduce new local names in the current block
unless there is also a global declaration.

    Function definition: def name ...
    Argument declaration: def f(...name...), lambda ...name...
    Class definition: class name ...
    Assignment statement: name = ...
    Import statement: import name, import module as name,
        from module import name
    Implicit assignment: names are bound by for statements and except
        clauses

There are several cases where Python statements are illegal when used in
conjunction with nested scopes that contain free variables.

If a variable is referenced in an enclosed scope, it is an error to
delete the name. The compiler will raise a SyntaxError for 'del name'.

If the wild card form of import (import *) is used in a function and the
function contains a nested block with free variables, the compiler will
raise a SyntaxError.

If exec is used in a function and the function contains a nested block
with free variables, the compiler will raise a SyntaxError unless the
exec explicitly specifies the local namespace for the exec. (In other
words, "exec obj" would be illegal, but "exec obj in ns" would be
legal.)

If a name bound in a function scope is also the name of a module global
name or a standard builtin name, and the function contains a nested
function scope that references the name, the compiler will issue a
warning. The name resolution rules will result in different bindings
under Python 2.0 than under Python 2.2. The warning indicates that the
program may not run correctly with all versions of Python.

Discussion

The specified rules allow names defined in a function to be referenced
in any nested function defined with that function. The name resolution
rules are typical for statically scoped languages, with three primary
exceptions:

-   Names in class scope are not accessible.
-   The global statement short-circuits the normal rules.
-   Variables are not declared.

Names in class scope are not accessible. Names are resolved in the
innermost enclosing function scope. If a class definition occurs in a
chain of nested scopes, the resolution process skips class definitions.
This rule prevents odd interactions between class attributes and local
variable access. If a name binding operation occurs in a class
definition, it creates an attribute on the resulting class object. To
access this variable in a method, or in a function nested within a
method, an attribute reference must be used, either via self or via the
class name.

An alternative would have been to allow name binding in class scope to
behave exactly like name binding in function scope. This rule would
allow class attributes to be referenced either via attribute reference
or simple name. This option was ruled out because it would have been
inconsistent with all other forms of class and instance attribute
access, which always use attribute references. Code that used simple
names would have been obscure.

The global statement short-circuits the normal rules. Under the
proposal, the global statement has exactly the same effect that it does
for Python 2.0. It is also noteworthy because it allows name binding
operations performed in one block to change bindings in another block
(the module).

Variables are not declared. If a name binding operation occurs anywhere
in a function, then that name is treated as local to the function and
all references refer to the local binding. If a reference occurs before
the name is bound, a NameError is raised. The only kind of declaration
is the global statement, which allows programs to be written using
mutable global variables. As a consequence, it is not possible to rebind
a name defined in an enclosing scope. An assignment operation can only
bind a name in the current scope or in the global scope. The lack of
declarations and the inability to rebind names in enclosing scopes are
unusual for lexically scoped languages; there is typically a mechanism
to create name bindings (e.g. lambda and let in Scheme) and a mechanism
to change the bindings (set! in Scheme).

Examples

A few examples are included to illustrate the way the rules work.

    >>> def make_adder(base):
    ...     def adder(x):
    ...         return base + x
    ...     return adder
    >>> add5 = make_adder(5)
    >>> add5(6)
    11

    >>> def make_fact():
    ...     def fact(n):
    ...         if n == 1:
    ...             return 1L
    ...         else:
    ...             return n * fact(n - 1)
    ...     return fact
    >>> fact = make_fact()
    >>> fact(7)
    5040L

    >>> def make_wrapper(obj):
    ...     class Wrapper:
    ...         def __getattr__(self, attr):
    ...             if attr[0] != '_':
    ...                 return getattr(obj, attr)
    ...             else:
    ...                 raise AttributeError, attr
    ...     return Wrapper()
    >>> class Test:
    ...     public = 2
    ...     _private = 3
    >>> w = make_wrapper(Test())
    >>> w.public
    2
    >>> w._private
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    AttributeError: _private

An example from Tim Peters demonstrates the potential pitfalls of nested
scopes in the absence of declarations:

    i = 6
    def f(x):
        def g():
            print i
        # ...
        # skip to the next page
        # ...
        for i in x:  # ah, i *is* local to f, so this is what g sees
            pass
        g()

The call to g() will refer to the variable i bound in f() by the for
loop. If g() is called before the loop is executed, a NameError will be
raised.

Backwards compatibility

There are two kinds of compatibility problems caused by nested scopes.
In one case, code that behaved one way in earlier versions behaves
differently because of nested scopes. In the other cases, certain
constructs interact badly with nested scopes and will trigger
SyntaxErrors at compile time.

The following example from Skip Montanaro illustrates the first kind of
problem:

    x = 1
    def f1():
        x = 2
        def inner():
            print x
        inner()

Under the Python 2.0 rules, the print statement inside inner() refers to
the global variable x and will print 1 if f1() is called. Under the new
rules, it refers to the f1()'s namespace, the nearest enclosing scope
with a binding.

The problem occurs only when a global variable and a local variable
share the same name and a nested function uses that name to refer to the
global variable. This is poor programming practice, because readers will
easily confuse the two different variables. One example of this problem
was found in the Python standard library during the implementation of
nested scopes.

To address this problem, which is unlikely to occur often, the Python
2.1 compiler (when nested scopes are not enabled) issues a warning.

The other compatibility problem is caused by the use of import * and
'exec' in a function body, when that function contains a nested scope
and the contained scope has free variables. For example:

    y = 1
    def f():
        exec "y = 'gotcha'" # or from module import *
        def g():
            return y
        ...

At compile-time, the compiler cannot tell whether an exec that operates
on the local namespace or an import * will introduce name bindings that
shadow the global y. Thus, it is not possible to tell whether the
reference to y in g() should refer to the global or to a local name in
f().

In discussion of the python-list, people argued for both possible
interpretations. On the one hand, some thought that the reference in g()
should be bound to a local y if one exists. One problem with this
interpretation is that it is impossible for a human reader of the code
to determine the binding of y by local inspection. It seems likely to
introduce subtle bugs. The other interpretation is to treat exec and
import * as dynamic features that do not effect static scoping. Under
this interpretation, the exec and import * would introduce local names,
but those names would never be visible to nested scopes. In the specific
example above, the code would behave exactly as it did in earlier
versions of Python.

Since each interpretation is problematic and the exact meaning
ambiguous, the compiler raises an exception. The Python 2.1 compiler
issues a warning when nested scopes are not enabled.

A brief review of three Python projects (the standard library, Zope, and
a beta version of PyXPCOM) found four backwards compatibility issues in
approximately 200,000 lines of code. There was one example of case #1
(subtle behavior change) and two examples of import * problems in the
standard library.

(The interpretation of the import * and exec restriction that was
implemented in Python 2.1a2 was much more restrictive, based on language
that in the reference manual that had never been enforced. These
restrictions were relaxed following the release.)

Compatibility of C API

The implementation causes several Python C API functions to change,
including PyCode_New(). As a result, C extensions may need to be updated
to work correctly with Python 2.1.

locals() / vars()

These functions return a dictionary containing the current scope's local
variables. Modifications to the dictionary do not affect the values of
variables. Under the current rules, the use of locals() and globals()
allows the program to gain access to all the namespaces in which names
are resolved.

An analogous function will not be provided for nested scopes. Under this
proposal, it will not be possible to gain dictionary-style access to all
visible scopes.

Warnings and Errors

The compiler will issue warnings in Python 2.1 to help identify programs
that may not compile or run correctly under future versions of Python.
Under Python 2.2 or Python 2.1 if the nested_scopes future statement is
used, which are collectively referred to as "future semantics" in this
section, the compiler will issue SyntaxErrors in some cases.

The warnings typically apply when a function that contains a nested
function that has free variables. For example, if function F contains a
function G and G uses the builtin len(), then F is a function that
contains a nested function (G) with a free variable (len). The label
"free-in-nested" will be used to describe these functions.

import * used in function scope

The language reference specifies that import * may only occur in a
module scope. (Sec. 6.11) The implementation of C Python has supported
import * at the function scope.

If import * is used in the body of a free-in-nested function, the
compiler will issue a warning. Under future semantics, the compiler will
raise a SyntaxError.

bare exec in function scope

The exec statement allows two optional expressions following the keyword
"in" that specify the namespaces used for locals and globals. An exec
statement that omits both of these namespaces is a bare exec.

If a bare exec is used in the body of a free-in-nested function, the
compiler will issue a warning. Under future semantics, the compiler will
raise a SyntaxError.

local shadows global

If a free-in-nested function has a binding for a local variable that (1)
is used in a nested function and (2) is the same as a global variable,
the compiler will issue a warning.

Rebinding names in enclosing scopes

There are technical issues that make it difficult to support rebinding
of names in enclosing scopes, but the primary reason that it is not
allowed in the current proposal is that Guido is opposed to it. His
motivation: it is difficult to support, because it would require a new
mechanism that would allow the programmer to specify that an assignment
in a block is supposed to rebind the name in an enclosing block;
presumably a keyword or special syntax (x := 3) would make this
possible. Given that this would encourage the use of local variables to
hold state that is better stored in a class instance, it's not worth
adding new syntax to make this possible (in Guido's opinion).

The proposed rules allow programmers to achieve the effect of rebinding,
albeit awkwardly. The name that will be effectively rebound by enclosed
functions is bound to a container object. In place of assignment, the
program uses modification of the container to achieve the desired
effect:

    def bank_account(initial_balance):
        balance = [initial_balance]
        def deposit(amount):
            balance[0] = balance[0] + amount
            return balance
        def withdraw(amount):
            balance[0] = balance[0] - amount
            return balance
        return deposit, withdraw

Support for rebinding in nested scopes would make this code clearer. A
class that defines deposit() and withdraw() methods and the balance as
an instance variable would be clearer still. Since classes seem to
achieve the same effect in a more straightforward manner, they are
preferred.

Implementation

The implementation for C Python uses flat closures[1]. Each def or
lambda expression that is executed will create a closure if the body of
the function or any contained function has free variables. Using flat
closures, the creation of closures is somewhat expensive but lookup is
cheap.

The implementation adds several new opcodes and two new kinds of names
in code objects. A variable can be either a cell variable or a free
variable for a particular code object. A cell variable is referenced by
containing scopes; as a result, the function where it is defined must
allocate separate storage for it on each invocation. A free variable is
referenced via a function's closure.

The choice of free closures was made based on three factors. First,
nested functions are presumed to be used infrequently, deeply nested
(several levels of nesting) still less frequently. Second, lookup of
names in a nested scope should be fast. Third, the use of nested scopes,
particularly where a function that access an enclosing scope is
returned, should not prevent unreferenced objects from being reclaimed
by the garbage collector.

References

Copyright

[1] Luca Cardelli. Compiling a functional language. In Proc. of the 1984
ACM Conference on Lisp and Functional Programming, pp. 208-217, Aug.
1984 https://dl.acm.org/doi/10.1145/800055.802037