PEP: 504 Title: Using the System RNG by default Version: $Revision$
Last-Modified: $Date$ Author: Alyssa Coghlan <ncoghlan@gmail.com>
Status: Withdrawn Type: Standards Track Content-Type: text/x-rst
Created: 15-Sep-2015 Python-Version: 3.6 Post-History: 15-Sep-2015

Abstract

Python currently defaults to using the deterministic Mersenne Twister
random number generator for the module level APIs in the random module,
requiring users to know that when they're performing "security
sensitive" work, they should instead switch to using the
cryptographically secure os.urandom or random.SystemRandom interfaces or
a third party library like cryptography.

Unfortunately, this approach has resulted in a situation where
developers that aren't aware that they're doing security sensitive work
use the default module level APIs, and thus expose their users to
unnecessary risks.

This isn't an acute problem, but it is a chronic one, and the often long
delays between the introduction of security flaws and their exploitation
means that it is difficult for developers to naturally learn from
experience.

In order to provide an eventually pervasive solution to the problem,
this PEP proposes that Python switch to using the system random number
generator by default in Python 3.6, and require developers to opt-in to
using the deterministic random number generator process wide either by
using a new random.ensure_repeatable() API, or by explicitly creating
their own random.Random() instance.

To minimise the impact on existing code, module level APIs that require
determinism will implicitly switch to the deterministic PRNG.

PEP Withdrawal

During discussion of this PEP, Steven D'Aprano proposed the simpler
alternative of offering a standardised secrets module that provides "one
obvious way" to handle security sensitive tasks like generating default
passwords and other tokens.

Steven's proposal has the desired effect of aligning the easy way to
generate such tokens and the right way to generate them, without
introducing any compatibility risks for the existing random module API,
so this PEP has been withdrawn in favour of further work on refining
Steven's proposal as PEP 506.

Proposal

Currently, it is never correct to use the module level functions in the
random module for security sensitive applications. This PEP proposes to
change that admonition in Python 3.6+ to instead be that it is not
correct to use the module level functions in the random module for
security sensitive applications if random.ensure_repeatable() is ever
called (directly or indirectly) in that process.

To achieve this, rather than being bound methods of a random.Random
instance as they are today, the module level callables in random would
change to be functions that delegate to the corresponding method of the
existing random._inst module attribute.

By default, this attribute will be bound to a random.SystemRandom
instance.

A new random.ensure_repeatable() API will then rebind the random._inst
attribute to a system.Random instance, restoring the same module level
API behaviour as existed in previous Python versions (aside from the
additional level of indirection):

    def ensure_repeatable():
        """Switch to using random.Random() for the module level APIs

        This switches the default RNG instance from the cryptographically
        secure random.SystemRandom() to the deterministic random.Random(),
        enabling the seed(), getstate() and setstate() operations. This means
        a particular random scenario can be replayed later by providing the
        same seed value or restoring a previously saved state.

        NOTE: Libraries implementing security sensitive operations should
        always explicitly use random.SystemRandom() or os.urandom in order to
        correctly handle applications that call this function.
        """
        if not isinstance(_inst, Random):
            _inst = random.Random()

To minimise the impact on existing code, calling any of the following
module level functions will implicitly call random.ensure_repeatable():

-   random.seed
-   random.getstate
-   random.setstate

There are no changes proposed to the random.Random or
random.SystemRandom class APIs - applications that explicitly
instantiate their own random number generators will be entirely
unaffected by this proposal.

Warning on implicit opt-in

In Python 3.6, implicitly opting in to the use of the deterministic PRNG
will emit a deprecation warning using the following check:

    if not isinstance(_inst, Random):
        warnings.warn(DeprecationWarning,
                      "Implicitly ensuring repeatability. "
                      "See help(random.ensure_repeatable) for details")
        ensure_repeatable()

The specific wording of the warning should have a suitable answer added
to Stack Overflow as was done for the custom error message that was
added for missing parentheses in a call to print[1].

In the first Python 3 release after Python 2.7 switches to security fix
only mode, the deprecation warning will be upgraded to a RuntimeWarning
so it is visible by default.

This PEP does not propose ever removing the ability to ensure the
default RNG used process wide is a deterministic PRNG that will produce
the same series of outputs given a specific seed. That capability is
widely used in modelling and simulation scenarios, and requiring that
ensure_repeatable() be called either directly or indirectly is a
sufficient enhancement to address the cases where the module level
random API is used for security sensitive tasks in web applications
without due consideration for the potential security implications of
using a deterministic PRNG.

Performance impact

Due to the large performance difference between random.Random and
random.SystemRandom, applications ported to Python 3.6 will encounter a
significant performance regression in cases where:

-   the application is using the module level random API
-   cryptographic quality randomness isn't needed
-   the application doesn't already implicitly opt back in to the
    deterministic PRNG by calling random.seed, random.getstate, or
    random.setstate
-   the application isn't updated to explicitly call
    random.ensure_repeatable

This would be noted in the Porting section of the Python 3.6 What's New
guide, with the recommendation to include the following code in the
__main__ module of affected applications:

    if hasattr(random, "ensure_repeatable"):
        random.ensure_repeatable()

Applications that do need cryptographic quality randomness should be
using the system random number generator regardless of speed
considerations, so in those cases the change proposed in this PEP will
fix a previously latent security defect.

Documentation changes

The random module documentation would be updated to move the
documentation of the seed, getstate and setstate interfaces later in the
module, along with the documentation of the new ensure_repeatable
function and the associated security warning.

That section of the module documentation would also gain a discussion of
the respective use cases for the deterministic PRNG enabled by
ensure_repeatable (games, modelling & simulation, software testing) and
the system RNG that is used by default (cryptography, security token
generation). This discussion will also recommend the use of third party
security libraries for the latter task.

Rationale

Writing secure software under deadline and budget pressures is a hard
problem. This is reflected in regular notifications of data breaches
involving personally identifiable information[2], as well as with
failures to take security considerations into account when new systems,
like motor vehicles [3], are connected to the internet. It's also the
case that a lot of the programming advice readily available on the
internet [#search] simply doesn't take the mathematical arcana of
computer security into account. Compounding these issues is the fact
that defenders have to cover all of their potential vulnerabilities, as
a single mistake can make it possible to subvert other defences[4].

One of the factors that contributes to making this last aspect
particularly difficult is APIs where using them inappropriately creates
a silent security failure - one where the only way to find out that what
you're doing is incorrect is for someone reviewing your code to say
"that's a potential security problem", or for a system you're
responsible for to be compromised through such an oversight (and you're
not only still responsible for that system when it is compromised, but
your intrusion detection and auditing mechanisms are good enough for you
to be able to figure out after the event how the compromise took place).

This kind of situation is a significant contributor to "security
fatigue", where developers (often rightly[5]) feel that security
engineers spend all their time saying "don't do that the easy way, it
creates a security vulnerability".

As the designers of one of the world's most popular languages[6], we can
help reduce that problem by making the easy way the right way (or at
least the "not wrong" way) in more circumstances, so developers and
security engineers can spend more time worrying about mitigating
actually interesting threats, and less time fighting with default
language behaviours.

Discussion

Why "ensure_repeatable" over "ensure_deterministic"?

This is a case where the meaning of a word as specialist jargon
conflicts with the typical meaning of the word, even though it's
technically the same.

From a technical perspective, a "deterministic RNG" means that given
knowledge of the algorithm and the current state, you can reliably
compute arbitrary future states.

The problem is that "deterministic" on its own doesn't convey those
qualifiers, so it's likely to instead be interpreted as "predictable" or
"not random" by folks that are familiar with the conventional meaning,
but aren't familiar with the additional qualifiers on the technical
meaning.

A second problem with "deterministic" as a description for the
traditional RNG is that it doesn't really tell you what you can do with
the traditional RNG that you can't do with the system one.

"ensure_repeatable" aims to address both of those problems, as its
common meaning accurately describes the main reason for preferring the
deterministic PRNG over the system RNG: ensuring you can repeat the same
series of outputs by providing the same seed value, or by restoring a
previously saved PRNG state.

Only changing the default for Python 3.6+

Some other recent security changes, such as upgrading the capabilities
of the ssl module and switching to properly verifying HTTPS certificates
by default, have been considered critical enough to justify backporting
the change to all currently supported versions of Python.

The difference in this case is one of degree - the additional benefits
from rolling out this particular change a couple of years earlier than
will otherwise be the case aren't sufficient to justify either the
additional effort or the stability risks involved in making such an
intrusive change in a maintenance release.

Keeping the module level functions

In additional to general backwards compatibility considerations, Python
is widely used for educational purposes, and we specifically don't want
to invalidate the wide array of educational material that assumes the
availability of the current random module API. Accordingly, this
proposal ensures that most of the public API can continue to be used not
only without modification, but without generating any new warnings.

Warning when implicitly opting in to the deterministic RNG

It's necessary to implicitly opt in to the deterministic PRNG as Python
is widely used for modelling and simulation purposes where this is the
right thing to do, and in many cases, these software models won't have a
dedicated maintenance team tasked with ensuring they keep working on the
latest versions of Python.

Unfortunately, explicitly calling random.seed with data from os.urandom
is also a mistake that appears in a number of the flawed "how to
generate a security token in Python" guides readily available online.

Using first DeprecationWarning, and then eventually a RuntimeWarning, to
advise against implicitly switching to the deterministic PRNG aims to
nudge future users that need a cryptographically secure RNG away from
calling random.seed() and those that genuinely need a deterministic
generator towards explicitly calling random.ensure_repeatable().

Avoiding the introduction of a userspace CSPRNG

The original discussion of this proposal on python-ideas[7] suggested
introducing a cryptographically secure pseudo-random number generator
and using that by default, rather than defaulting to the relatively slow
system random number generator.

The problem[8] with this approach is that it introduces an additional
point of failure in security sensitive situations, for the sake of
applications where the random number generation may not even be on a
critical performance path.

Applications that do need cryptographic quality randomness should be
using the system random number generator regardless of speed
considerations, so in those cases.

Isn't the deterministic PRNG "secure enough"?

In a word, "No" - that's why there's a warning in the module
documentation that says not to use it for security sensitive purposes.
While we're not currently aware of any studies of Python's random number
generator specifically, studies of PHP's random number generator[9] have
demonstrated the ability to use weaknesses in that subsystem to
facilitate a practical attack on password recovery tokens in popular PHP
web applications.

However, one of the rules of secure software development is that
"attacks only get better, never worse", so it may be that by the time
Python 3.6 is released we will actually see a practical attack on
Python's deterministic PRNG publicly documented.

Security fatigue in the Python ecosystem

Over the past few years, the computing industry as a whole has been
making a concerted effort to upgrade the shared network infrastructure
we all depend on to a "secure by default" stance. As one of the most
widely used programming languages for network service development
(including the OpenStack Infrastructure-as-a-Service platform) and for
systems administration on Linux systems in general, a fair share of that
burden has fallen on the Python ecosystem, which is understandably
frustrating for Pythonistas using Python in other contexts where these
issues aren't of as great a concern.

This consideration is one of the primary factors driving the substantial
backwards compatibility improvements in this proposal relative to the
initial draft concept posted to python-ideas[10].

Acknowledgements

-   Theo de Raadt, for making the suggestion to Guido van Rossum that we
    seriously consider defaulting to a cryptographically secure random
    number generator
-   Serhiy Storchaka, Terry Reedy, Petr Viktorin, and anyone else in the
    python-ideas threads that suggested the approach of transparently
    switching to the random.Random implementation when any of the
    functions that only make sense for a deterministic RNG are called
-   Nathaniel Smith for providing the reference on practical attacks
    against PHP's random number generator when used to generate password
    reset tokens
-   Donald Stufft for pursuing additional discussions with network
    security experts that suggested the introduction of a userspace
    CSPRNG would mean additional complexity for insufficient gain
    relative to just using the system RNG directly
-   Paul Moore for eloquently making the case for the current level of
    security fatigue in the Python ecosystem

References

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

[1] Stack Overflow answer for missing parentheses in call to print
(http://stackoverflow.com/questions/25445439/what-does-syntaxerror-missing-parentheses-in-call-to-print-mean-in-python/25445440#25445440)

[2] Visualization of data breaches involving more than 30k records
(each)
(http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/)

[3] Remote UConnect hack for Jeep Cherokee
(http://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/)

[4] Bypassing bcrypt through an insecure data cache
(http://arstechnica.com/security/2015/09/once-seen-as-bulletproof-11-million-ashley-madison-passwords-already-cracked/)

[5] OWASP Top Ten Web Security Issues for 2013
(https://www.owasp.org/index.php/OWASP_Top_Ten_Project#tab=OWASP_Top_10_for_2013)

[6] IEEE Spectrum 2015 Top Ten Programming Languages
(http://spectrum.ieee.org/computing/software/the-2015-top-ten-programming-languages)

[7] python-ideas thread discussing using a userspace CSPRNG
(https://mail.python.org/pipermail/python-ideas/2015-September/035886.html)

[8] Safely generating random numbers
(http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/)

[9] PRNG based attack against password reset tokens in PHP applications
(https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf)

[10] Initial draft concept that eventually became this PEP
(https://mail.python.org/pipermail/python-ideas/2015-September/036095.html)