PEP: 712 Title: Adding a "converter" parameter to dataclasses.field
Author: Joshua Cannon <joshdcannon@gmail.com> Sponsor: Eric V. Smith
<eric at trueblade.com> Discussions-To:
https://discuss.python.org/t/pep-712-adding-a-converter-parameter-to-dataclasses-field/26126
Status: Rejected Type: Standards Track Content-Type: text/x-rst Created:
01-Jan-2023 Python-Version: 3.13 Post-History: 27-Dec-2022, 19-Jan-2023,
23-Apr-2023, Resolution:
https://discuss.python.org/t/pep-712-adding-a-converter-parameter-to-dataclasses-field/26126/98

Rejection Notice

The reasons for the 2024 Steering Council rejection include:

-   We did not find evidence of a strong consensus that this feature was
    needed in the standard library, despite some proponents arguing in
    favor in order to reduce their dependence on third party packages.
    For those who need such functionality, we think those existing third
    party libraries such as attrs and Pydantic (which the PEP
    references) are acceptable alternatives.
-   This feature seems to us like an accumulation of what could be
    considered more cruft in the standard library, leading us ever
    farther away from the “simple” use cases that dataclasses are ideal
    for.
-   Reading the “How to Teach This” section of the PEP gives us pause
    that the pitfalls and gotchas are significant, with a heightened
    confusion and complexity outweighing any potential benefits.
-   The PEP seems more focused toward helping type checkers than people
    using the library.

Abstract

PEP 557 added dataclasses to the Python stdlib. PEP 681 added
~py3.11:typing.dataclass_transform to help type checkers understand
several common dataclass-like libraries, such as attrs, Pydantic, and
object relational mapper (ORM) packages such as SQLAlchemy and Django.

A common feature other libraries provide over the standard library
implementation is the ability for the library to convert arguments given
at initialization time into the types expected for each field using a
user-provided conversion function.

Therefore, this PEP adds a converter parameter to dataclasses.field
(along with the requisite changes to dataclasses.Field and
~py3.11:typing.dataclass_transform) to specify the function to use to
convert the input value for each field to the representation to be
stored in the dataclass.

Motivation

There is no existing, standard way for dataclasses or third-party
dataclass-like libraries to support argument conversion in a
type-checkable way. To work around this limitation, library
authors/users are forced to choose to:

-   Opt-in to a custom Mypy plugin. These plugins help Mypy understand
    the conversion semantics, but not other tools.
-   Shift conversion responsibility onto the caller of the dataclass
    constructor. This can make constructing certain dataclasses
    unnecessarily verbose and repetitive.
-   Provide a custom __init__ which declares "wider" parameter types and
    converts them when setting the appropriate attribute. This not only
    duplicates the typing annotations between the converter and
    __init__, but also opts the user out of many of the features
    dataclasses provides.
-   Provide a custom __init__ but without meaningful type annotations
    for the parameter types requiring conversion.

None of these choices are ideal.

Rationale

Adding argument conversion semantics is useful and beneficial enough
that most dataclass-like libraries provide support. Adding this feature
to the standard library means more users are able to opt-in to these
benefits without requiring third-party libraries. Additionally
third-party libraries are able to clue type-checkers into their own
conversion semantics through added support in
~py3.11:typing.dataclass_transform, meaning users of those libraries
benefit as well.

Specification

New converter parameter

This specification introduces a new parameter named converter to the
dataclasses.field function. If provided, it represents a single-argument
callable used to convert all values when assigning to the associated
attribute.

For frozen dataclasses, the converter is only used inside a
dataclass-synthesized __init__ when setting the attribute. For
non-frozen dataclasses, the converter is used for all attribute
assignment (E.g. obj.attr = value), which includes assignment of default
values.

The converter is not used when reading attributes, as the attributes
should already have been converted.

Adding this parameter also implies the following changes:

-   A converter attribute will be added to dataclasses.Field.
-   converter will be added to ~py3.11:typing.dataclass_transform's list
    of supported field specifier parameters.

Example

    def str_or_none(x: Any) -> str | None:
      return str(x) if x is not None else None

    @dataclasses.dataclass
    class InventoryItem:
        # `converter` as a type (including a GenericAlias).
        id: int = dataclasses.field(converter=int)
        skus: tuple[int, ...] = dataclasses.field(converter=tuple[int, ...])
        # `converter` as a callable.
        vendor: str | None = dataclasses.field(converter=str_or_none))
        names: tuple[str, ...] = dataclasses.field(
          converter=lambda names: tuple(map(str.lower, names))
        )  # Note that lambdas are supported, but discouraged as they are untyped.

        # The default value is also converted; therefore the following is not a
        # type error.
        stock_image_path: pathlib.PurePosixPath = dataclasses.field(
          converter=pathlib.PurePosixPath, default="assets/unknown.png"
        )

        # Default value conversion extends to `default_factory`;
        # therefore the following is also not a type error.
        shelves: tuple = dataclasses.field(
          converter=tuple, default_factory=list
        )

    item1 = InventoryItem(
      "1",
      [234, 765],
      None,
      ["PYTHON PLUSHIE", "FLUFFY SNAKE"]
    )
    # item1's repr would be (with added newlines for readability):
    #   InventoryItem(
    #     id=1,
    #     skus=(234, 765),
    #     vendor=None,
    #     names=('PYTHON PLUSHIE', 'FLUFFY SNAKE'),
    #     stock_image_path=PurePosixPath('assets/unknown.png'),
    #     shelves=()
    #   )

    # Attribute assignment also participates in conversion.
    item1.skus = [555]
    # item1's skus attribute is now (555,).

Impact on typing

A converter must be a callable that accepts a single positional
argument, and the parameter type corresponding to this positional
argument provides the type of the the synthesized __init__ parameter
associated with the field.

In other words, the argument provided for the converter parameter must
be compatible with Callable[[T], X] where T is the input type for the
converter and X is the output type of the converter.

Type-checking default and default_factory

Because default values are unconditionally converted using converter, if
an argument for converter is provided alongside either default or
default_factory, the type of the default (the default argument if
provided, otherwise the return value of default_factory) should be
checked using the type of the single argument to the converter callable.

Converter return type

The return type of the callable must be a type that's compatible with
the field's declared type. This includes the field's type exactly, but
can also be a type that's more specialized (such as a converter
returning a list[int] for a field annotated as list, or a converter
returning an int for a field annotated as int | str).

Indirection of allowable argument types

One downside introduced by this PEP is that knowing what argument types
are allowed in the dataclass' __init__ and during attribute assignment
is not immediately obvious from reading the dataclass. The allowable
types are defined by the converter.

This is true when reading code from source, however typing-related aides
such as typing.reveal_type and "IntelliSense" in an IDE should make it
easy to know exactly what types are allowed without having to read any
source code.

Backward Compatibility

These changes don't introduce any compatibility problems since they only
introduce opt-in new features.

Security Implications

There are no direct security concerns with these changes.

How to Teach This

Documentation and examples explaining the new parameter and behavior
will be added to the relevant sections of the docs site (primarily on
dataclasses) and linked from the What's New document.

The added documentation/examples will also cover the "common pitfalls"
that users of converters are likely to encounter. Such pitfalls include:

-   Needing to handle None/sentinel values.
-   Needing to handle values that are already of the correct type.
-   Avoiding lambdas for converters, as the synthesized __init__
    parameter's type will become Any.
-   Forgetting to convert values in the bodies of user-defined __init__
    in frozen dataclasses.
-   Forgetting to convert values in the bodies of user-defined
    __setattr__ in non-frozen dataclasses.

Additionally, potentially confusing pattern matching semantics should be
covered:

    @dataclass
    class Point:
        x: int = field(converter=int)
        y: int

    match Point(x="0", y=0):
        case Point(x="0", y=0):  # Won't be matched
            ...
        case Point():  # Will be matched
            ...
        case _:
            ...

However it's worth noting this behavior is true of any type that does
conversion in its initializer, and type-checkers should be able to catch
this pitfall:

    match int("0"):
      case int("0"):  # Won't be matched
          ...
      case _:  # Will be matched
          ...

Reference Implementation

The attrs library already includes a converter parameter exhibiting the
same converter semantics (converting in the initializer and on attribute
setting) when using the @define class decorator.

CPython support is implemented on a branch in the author's fork.

Rejected Ideas

Just adding "converter" to typing.dataclass_transform's field_specifiers

The idea of isolating this addition to
~py3.11:typing.dataclass_transform was briefly discussed on Typing-SIG
where it was suggested to broaden this to dataclasses more generally.

Additionally, adding this to dataclasses ensures anyone can reap the
benefits without requiring additional libraries.

Not converting default values

There are pros and cons with both converting and not converting default
values. Leaving default values as-is allows type-checkers and dataclass
authors to expect that the type of the default matches the type of the
field. However, converting default values has three large advantages:

1.  Consistency. Unconditionally converting all values that are assigned
    to the attribute, involves fewer "special rules" that users must
    remember.
2.  Simpler defaults. Allowing the default value to have the same type
    as user-provided values means dataclass authors get the same
    conveniences as their callers.
3.  Compatibility with attrs. Attrs unconditionally uses the converter
    to convert default values.

Automatic conversion using the field's type

One idea could be to allow the type of the field specified (e.g. str or
int) to be used as a converter for each argument provided. Pydantic's
data conversion has semantics which appear to be similar to this
approach.

This works well for fairly simple types, but leads to ambiguity in
expected behavior for complex types such as generics. E.g. For
tuple[int, ...] it is ambiguous if the converter is supposed to simply
convert an iterable to a tuple, or if it is additionally supposed to
convert each element type to int. Or for int | None, which isn't
callable.

Deducing the attribute type from the return type of the converter

Another idea would be to allow the user to omit the attribute's type
annotation if providing a field with a converter argument. Although this
would reduce the common repetition this PEP introduces (e.g.
x: str = field(converter=str)), it isn't clear how to best support this
while maintaining the current dataclass semantics (namely, that the
attribute order is preserved for things like the synthesized __init__,
or dataclasses.fields). This is because there isn't an easy way in
Python (today) to get the annotation-only attributes interspersed with
un-annotated attributes in the order they were defined.

A sentinel annotation could be applied (e.g. x: FromConverter = ...),
however this breaks a fundamental assumption of type annotations.

Lastly, this is feasible if all fields (including those without a
converter) were assigned to dataclasses.field, which means the class'
own namespace holds the order, however this trades repetition of
type+converter with repetition of field assignment. The end result is no
gain or loss of repetition, but with the added complexity of dataclasses
semantics.

This PEP doesn't suggest it can't or shouldn't be done. Just that it
isn't included in this PEP.

References

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.