PEP: 671 Title: Syntax for late-bound function argument defaults Author:
Chris Angelico <rosuav@gmail.com> Discussions-To:
https://mail.python.org/archives/list/python-ideas@python.org/thread/UVOQEK7IRFSCBOH734T5GFJOEJXFCR6A/
Status: Draft Type: Standards Track Content-Type: text/x-rst Created:
24-Oct-2021 Python-Version: 3.12 Post-History: 24-Oct-2021, 01-Dec-2021

Abstract

Function parameters can have default values which are calculated during
function definition and saved. This proposal introduces a new form of
argument default, defined by an expression to be evaluated at function
call time.

Motivation

Optional function arguments, if omitted, often have some sort of logical
default value. When this value depends on other arguments, or needs to
be reevaluated each function call, there is currently no clean way to
state this in the function header.

Currently-legal idioms for this include:

    # Very common: Use None and replace it in the function
    def bisect_right(a, x, lo=0, hi=None, *, key=None):
        if hi is None:
            hi = len(a)

    # Also well known: Use a unique custom sentinel object
    _USE_GLOBAL_DEFAULT = object()
    def connect(timeout=_USE_GLOBAL_DEFAULT):
        if timeout is _USE_GLOBAL_DEFAULT:
            timeout = default_timeout

    # Unusual: Accept star-args and then validate
    def add_item(item, *optional_target):
        if not optional_target:
            target = []
        else:
            target = optional_target[0]

In each form, help(function) fails to show the true default value. Each
one has additional problems, too; using None is only valid if None is
not itself a plausible function parameter, the custom sentinel requires
a global constant; and use of star-args implies that more than one
argument could be given.

Specification

Function default arguments can be defined using the new => notation:

    def bisect_right(a, x, lo=0, hi=>len(a), *, key=None):
    def connect(timeout=>default_timeout):
    def add_item(item, target=>[]):
    def format_time(fmt, time_t=>time.time()):

The expression is saved in its source code form for the purpose of
inspection, and bytecode to evaluate it is prepended to the function's
body.

Notably, the expression is evaluated in the function's run-time scope,
NOT the scope in which the function was defined (as are early-bound
defaults). This allows the expression to refer to other arguments.

Multiple late-bound arguments are evaluated from left to right, and can
refer to previously-defined values. Order is defined by the function,
regardless of the order in which keyword arguments may be passed.

  def prevref(word="foo", a=>len(word), b=>a//2): # Valid def
  selfref(spam=>spam): # UnboundLocalError def spaminate(sausage=>eggs +
  1, eggs=>sausage - 1): # Confusing, don't do this def
  frob(n=>len(items), items=[]): # See below

Evaluation order is left-to-right; however, implementations MAY choose
to do so in two separate passes, first for all passed arguments and
early-bound defaults, and then a second pass for late-bound defaults.
Otherwise, all arguments will be assigned strictly left-to-right.

Rejected choices of spelling

While this document specifies a single syntax name=>expression,
alternate spellings are similarly plausible. The following spellings
were considered:

    def bisect(a, hi=>len(a)):
    def bisect(a, hi:=len(a)):
    def bisect(a, hi?=len(a)):
    def bisect(a, @hi=len(a)):

Since default arguments behave largely the same whether they're early or
late bound, the chosen syntax hi=>len(a) is deliberately similar to the
existing early-bind syntax.

One reason for rejection of the := syntax is its behaviour with
annotations. Annotations go before the default, so in all syntax
options, it must be unambiguous (both to the human and the parser)
whether this is an annotation, a default, or both. The alternate syntax
target:=expr runs the risk of being misinterpreted as target:int=expr
with the annotation omitted in error, and may thus mask bugs. The chosen
syntax target=>expr does not have this problem.

How to Teach This

Early-bound default arguments should always be taught first, as they are
the simpler and more efficient way to evaluate arguments. Building on
them, late bound arguments are broadly equivalent to code at the top of
the function:

    def add_item(item, target=>[]):

    # Equivalent pseudocode:
    def add_item(item, target=<OPTIONAL>):
        if target was omitted: target = []

A simple rule of thumb is: "target=expression" is evaluated when the
function is defined, and "target=>expression" is evaluated when the
function is called. Either way, if the argument is provided at call
time, the default is ignored. While this does not completely explain all
the subtleties, it is sufficient to cover the important distinction here
(and the fact that they are similar).

Interaction with other proposals

PEP 661 attempts to solve one of the same problems as this does. It
seeks to improve the documentation of sentinel values in default
arguments, where this proposal seeks to remove the need for sentinels in
many common cases. PEP 661 is able to improve documentation in
arbitrarily complicated functions (it cites traceback.print_exception as
its primary motivation, which has two arguments which must
both-or-neither be specified); on the other hand, many of the common
cases would no longer need sentinels if the true default could be
defined by the function. Additionally, dedicated sentinel objects can be
used as dictionary lookup keys, where PEP 671 does not apply.

A generic system for deferred evaluation has been proposed at times (not
to be confused with PEP 563 and PEP 649 which are specific to
annotations). While it may seem, on the surface, that late-bound
argument defaults are of a similar nature, they are in fact unrelated
and orthogonal ideas, and both could be of value to the language. The
acceptance or rejection of this proposal would not affect the viability
of a deferred evaluation proposal, and vice versa. (A key difference
between generalized deferred evaluation and argument defaults is that
argument defaults will always and only be evaluated as the function
begins executing, whereas deferred expressions would only be realized
upon reference.)

Implementation details

The following relates to the reference implementation, and is not
necessarily part of the specification.

Argument defaults (positional or keyword) have both their values, as
already retained, and an extra piece of information. For positional
arguments, the extras are stored in a tuple in __defaults_extra__, and
for keyword-only, a dict in __kwdefaults_extra__. If this attribute is
None, it is equivalent to having None for every argument default.

For each parameter with a late-bound default, the special value Ellipsis
is stored as the value placeholder, and the corresponding extra
information needs to be queried. If it is None, then the default is
indeed the value Ellipsis; otherwise, it is a descriptive string and the
true value is calculated as the function begins.

When a parameter with a late-bound default is omitted, the function will
begin with the parameter unbound. The function begins by testing for
each parameter with a late-bound default using a new opcode
QUERY_FAST/QUERY_DEREF, and if unbound, evaluates the original
expression. This opcode (available only for fast locals and closure
variables) pushes True onto the stack if the given local has a value,
and False if not - meaning that it pushes False if LOAD_FAST or
LOAD_DEREF would raise UnboundLocalError, and True if it would succeed.

Out-of-order variable references are permitted as long as the referent
has a value from an argument or early-bound default.

Costs

When no late-bound argument defaults are used, the following costs
should be all that are incurred:

-   Function objects require two additional pointers, which will be NULL
-   Compiling code and constructing functions have additional flag
    checks
-   Using Ellipsis as a default value will require run-time verification
    to see if late-bound defaults exist.

These costs are expected to be minimal (on 64-bit Linux, this increases
all function objects from 152 bytes to 168), with virtually no run-time
cost when late-bound defaults are not used.

Backward incompatibility

Where late-bound defaults are not used, behaviour should be identical.
Care should be taken if Ellipsis is found, as it may not represent
itself, but beyond that, tools should see existing code unchanged.

References

https://github.com/rosuav/cpython/tree/pep-671

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.