PEP 597 – Add optional EncodingWarning
- Inada Naoki <songofacandy at gmail.com>
- Standards Track
Table of Contents
- Backward Compatibility
- Forward Compatibility
- How to Teach This
- Reference Implementation
Add a new warning category
EncodingWarning. It is emitted when the
encoding argument to
open() is omitted and the default
locale-specific encoding is used.
The warning is disabled by default. A new
command-line option and a new
variable can be used to enable it.
"locale" argument value for
encoding is added too. It
explicitly specifies that the locale encoding should be used, silencing
Using the default encoding is a common mistake
Developers using macOS or Linux may forget that the default encoding is not always UTF-8.
For example, using
long_description = open("README.md").read() in
setup.py is a common mistake. Many Windows users cannot install
such packages if there is at least one non-ASCII character
(e.g. emoji, author names, copyright symbols, and the like)
in their UTF-8-encoded
Of the 4000 most downloaded packages from PyPI, 489 use non-ASCII characters in their README, and 82 fail to install from source on non-UTF-8 locales due to not specifying an encoding for a non-ASCII file. 
Another example is
Some users might expect it to use UTF-8 by default, but the locale
encoding is actually what is used. 
Even Python experts may assume that the default encoding is UTF-8. This creates bugs that only happen on Windows; see , , , and  for example.
Emitting a warning when the
encoding argument is omitted will help
find such mistakes.
Explicit way to use locale-specific encoding
open(filename) isn’t explicit about which encoding is expected:
- If ASCII is assumed, this isn’t a bug, but may result in decreased performance on Windows, particularly with non-Latin-1 locale encodings
- If UTF-8 is assumed, this may be a bug or a platform-specific script
- If the locale encoding is assumed, the behavior is as expected (but could change if future versions of Python modify the default)
From this point of view,
open(filename) is not readable code.
encoding=locale.getpreferredencoding(False) can be used to
specify the locale encoding explicitly, but it is too long and easy
to misuse (e.g. one can forget to pass
False as its argument).
This PEP provides an explicit way to specify the locale encoding.
Prepare to change the default encoding to UTF-8
Since UTF-8 has become the de-facto standard text encoding, we might default to it for opening files in the future.
However, such a change will affect many applications and libraries.
If we start emitting
DeprecationWarning everywhere the
argument is omitted, it will be too noisy and painful.
Although this PEP doesn’t propose changing the default encoding, it will help enable that change by:
- Reducing the number of omitted
encodingarguments in libraries before we start emitting a
- Allowing users to pass
encoding="locale"to suppress the current warning and any
DeprecationWarningadded in the future, as well as retaining consistent behavior if later Python versions change the default, ensuring support for any Python version >=3.10.
Add a new
EncodingWarning warning class as a subclass of
Warning. It is emitted when the
encoding argument is omitted and
the default locale-specific encoding is used.
Options to enable the warning
-X warn_default_encoding option and the
PYTHONWARNDEFAULTENCODING environment variable are added. They
are used to enable
sys.flags.warn_default_encoding is also added. The flag is true when
EncodingWarning is enabled.
When the flag is set,
open() and other
modules using them will emit
EncodingWarning when the
argument is omitted.
EncodingWarning is a subclass of
Warning, they are
shown by default (if the
warn_default_encoding flag is set), unlike
io.TextIOWrapper will accept
"locale" as a valid argument to
encoding. It has the same meaning as the current
io.TextIOWrapper doesn’t emit
encoding="locale" is specified.
io.text_encoding() is a helper for functions with an
encoding=None parameter that pass it to
A pure Python implementation will look like this:
def text_encoding(encoding, stacklevel=1): """A helper function to choose the text encoding. When *encoding* is not None, just return it. Otherwise, return the default text encoding (i.e. "locale"). This function emits an EncodingWarning if *encoding* is None and sys.flags.warn_default_encoding is true. This function can be used in APIs with an encoding=None parameter that pass it to TextIOWrapper or open. However, please consider using encoding="utf-8" for new APIs. """ if encoding is None: if sys.flags.warn_default_encoding: import warnings warnings.warn( "'encoding' argument not specified.", EncodingWarning, stacklevel + 2) encoding = "locale" return encoding
pathlib.Path.read_text() can use it like this:
def read_text(self, encoding=None, errors=None): encoding = io.text_encoding(encoding) with self.open(mode='r', encoding=encoding, errors=errors) as f: return f.read()
EncodingWarning is emitted for
the caller of
read_text() instead of
Affected standard library modules
Many standard library modules will be affected by this change.
Most APIs accepting
encoding=None will use
as written in the previous section.
Where using the locale encoding as the default encoding is reasonable,
encoding="locale" will be used instead. For example,
subprocess module will use the locale encoding as the default
Many tests use
encoding specified to read
ASCII text files. They should be rewritten with
DeprecationWarning is suppressed by default, always
DeprecationWarning when the
encoding argument is
omitted would be too noisy.
Noisy warnings may lead developers to dismiss the
“locale” is not a codec alias
We don’t add “locale” as a codec alias because the locale can be changed at runtime.
encoding=None. This behavior cannot be implemented in
The new warning is not emitted by default, so this PEP is 100% backwards-compatible.
"locale" as the argument to
encoding is not
forward-compatible. Code using it will not work on Python older than
3.10, and will instead raise
LookupError: unknown encoding: locale.
Until developers can drop Python 3.9 support,
can only be used for finding missing
How to Teach This
For new users
EncodingWarning is used to write cross-platform code,
there is no need to teach it to new users.
We can just recommend using UTF-8 for text files and using
encoding="utf-8" when opening them.
For experienced users
open(filename) to read text files encoded in UTF-8 is a
common mistake. It may not work on Windows because UTF-8 is not the
You can use
-X warn_default_encoding or
PYTHONWARNDEFAULTENCODING=1 to find this type of mistake.
encoding argument is not a bug when opening text files
encoded in the locale encoding, but
encoding="locale" is recommended
in Python 3.10 and later because it is more explicit.
The latest discussion thread is: https://email@example.com/thread/SFYUP2TWD5JZ5KDLVSTZ44GWKVY4YNCV/
- Why not implement this in linters?
io.text_encoding()must be implemented in Python.
- It is difficult to find all callers of functions wrapping
- Many developers will not use the option.
- Some will, and report the warnings to libraries they use, so the option is worth it even if many developers don’t enable it.
- For example, I found  and  by running
pip install -U pip, and  by running
toxwith the reference implementation. This demonstrates how this option can be used to find potential issues.
- “Packages can’t be installed when encoding is not UTF-8” (https://github.com/methane/pep597-pypi-ascii)
- “Logging - Inconsistent behaviour when handling unicode” (https://bugs.python.org/issue37111)
- Packaging tutorial in packaging.python.org didn’t specify
encoding to read a
json.toolhad used locale encoding to read JSON files. (https://bugs.python.org/issue33684)
- site: Potential UnicodeDecodeError when handling pth file (https://bugs.python.org/issue33684)
- pypa/pip: “Installing packages fails if Python 3 installed into path with non-ASCII characters” (https://github.com/pypa/pip/issues/9054)
- “site: Potential UnicodeDecodeError when handling pth file” (https://bugs.python.org/issue43214)
- “[pypa/pip] Use
encodingoption or binary mode for open()” (https://github.com/pypa/pip/pull/9608)
- “Possible UnicodeError caused by missing encoding=”utf-8”” (https://github.com/tox-dev/tox/issues/1908)
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Last modified: 2021-09-17 00:59:22 GMT