PEP: 506 Title: Adding A Secrets Module To The Standard Library Version:
$Revision$ Last-Modified: $Date$ Author: Steven D'Aprano
<steve@pearwood.info> Status: Final Type: Standards Track Content-Type:
text/x-rst Created: 19-Sep-2015 Python-Version: 3.6 Post-History:

Abstract

This PEP proposes the addition of a module for common security-related
functions such as generating tokens to the Python standard library.

Definitions

Some common abbreviations used in this proposal:

-   PRNG:

    Pseudo Random Number Generator. A deterministic algorithm used to
    produce random-looking numbers with certain desirable statistical
    properties.

-   CSPRNG:

    Cryptographically Strong Pseudo Random Number Generator. An
    algorithm used to produce random-looking numbers which are resistant
    to prediction.

-   MT:

    Mersenne Twister. An extensively studied PRNG which is currently
    used by the random module as the default.

Rationale

This proposal is motivated by concerns that Python's standard library
makes it too easy for developers to inadvertently make serious security
errors. Theo de Raadt, the founder of OpenBSD, contacted Guido van
Rossum and expressed some concern[1] about the use of MT for generating
sensitive information such as passwords, secure tokens, session keys and
similar.

Although the documentation for the random module explicitly states that
the default is not suitable for security purposes[2], it is strongly
believed that this warning may be missed, ignored or misunderstood by
many Python developers. In particular:

-   developers may not have read the documentation and consequently not
    seen the warning;
-   they may not realise that their specific use of the module has
    security implications; or
-   not realising that there could be a problem, they have copied code
    (or learned techniques) from websites which don't offer best
    practises.

The first[3] hit when searching for "python how to generate passwords"
on Google is a tutorial that uses the default functions from the random
module[4]. Although it is not intended for use in web applications, it
is likely that similar techniques find themselves used in that
situation. The second hit is to a StackOverflow question about
generating passwords[5]. Most of the answers given, including the
accepted one, use the default functions. When one user warned that the
default could be easily compromised, they were told "I think you worry
too much."[6]

This strongly suggests that the existing random module is an attractive
nuisance when it comes to generating (for example) passwords or secure
tokens.

Additional motivation (of a more philosophical bent) can be found in the
post which first proposed this idea[7].

Proposal

Alternative proposals have focused on the default PRNG in the random
module, with the aim of providing "secure by default" cryptographically
strong primitives that developers can build upon without thinking about
security. (See Alternatives below.) This proposes a different approach:

-   The standard library already provides cryptographically strong
    primitives, but many users don't know they exist or when to use
    them.
-   Instead of requiring crypto-naive users to write secure code, the
    standard library should include a set of ready-to-use "batteries"
    for the most common needs, such as generating secure tokens. This
    code will both directly satisfy a need ("How do I generate a
    password reset token?"), and act as an example of acceptable
    practises which developers can learn from[8].

To do this, this PEP proposes that we add a new module to the standard
library, with the suggested name secrets. This module will contain a set
of ready-to-use functions for common activities with security
implications, together with some lower-level primitives.

The suggestion is that secrets becomes the go-to module for dealing with
anything which should remain secret (passwords, tokens, etc.) while the
random module remains backward-compatible.

API and Implementation

This PEP proposes the following functions for the secrets module:

-   Functions for generating tokens suitable for use in (e.g.) password
    recovery, as session keys, etc., in the following formats:
    -   as bytes, secrets.token_bytes;
    -   as text, using hexadecimal digits, secrets.token_hex;
    -   as text, using URL-safe base-64 encoding, secrets.token_urlsafe.
-   A limited interface to the system CSPRNG, using either os.urandom
    directly or random.SystemRandom. Unlike the random module, this does
    not need to provide methods for seeding, getting or setting the
    state, or any non-uniform distributions. It should provide the
    following:
    -   A function for choosing items from a sequence, secrets.choice.
    -   A function for generating a given number of random bits and/or
        bytes as an integer, secrets.randbits.
    -   A function for returning a random integer in the half-open range
        0 to the given upper limit, secrets.randbelow[9].
-   A function for comparing text or bytes digests for equality while
    being resistant to timing attacks, secrets.compare_digest.

The consensus appears to be that there is no need to add a new CSPRNG to
the random module to support these uses, SystemRandom will be
sufficient.

Some illustrative implementations have been given by Alyssa (Nick)
Coghlan[10] and a minimalist API by Tim Peters[11]. This idea has also
been discussed on the issue tracker for the "cryptography" module[12].
The following pseudo-code should be taken as the starting point for the
real implementation:

    from random import SystemRandom
    from hmac import compare_digest

    _sysrand = SystemRandom()

    randbits = _sysrand.getrandbits
    choice = _sysrand.choice

    def randbelow(exclusive_upper_bound):
        return _sysrand._randbelow(exclusive_upper_bound)

    DEFAULT_ENTROPY = 32  # bytes

    def token_bytes(nbytes=None):
        if nbytes is None:
            nbytes = DEFAULT_ENTROPY
        return os.urandom(nbytes)

    def token_hex(nbytes=None):
        return binascii.hexlify(token_bytes(nbytes)).decode('ascii')

    def token_urlsafe(nbytes=None):
        tok = token_bytes(nbytes)
        return base64.urlsafe_b64encode(tok).rstrip(b'=').decode('ascii')

The secrets module itself will be pure Python, and other Python
implementations can easily make use of it unchanged, or adapt it as
necessary. An implementation can be found on BitBucket[13].

Default arguments

One difficult question is "How many bytes should my token be?". We can
help with this question by providing a default amount of entropy for the
"token*" functions. If the nbytes argument is None or not given, the
default entropy will be used. This default value should be large enough
to be expected to be secure for medium-security uses, but is expected to
change in the future, possibly even in a maintenance release[14].

Naming conventions

One question is the naming conventions used in the module[15], whether
to use C-like naming conventions such as "randrange" or more Pythonic
names such as "random_range".

Functions which are simply bound methods of the private SystemRandom
instance (e.g. randrange), or a thin wrapper around such, should keep
the familiar names. Those which are something new (such as the various
token_* functions) will use more Pythonic names.

Alternatives

One alternative is to change the default PRNG provided by the random
module[16]. This received considerable scepticism and outright
opposition:

-   There is fear that a CSPRNG may be slower than the current PRNG
    (which in the case of MT is already quite slow).
-   Some applications (such as scientific simulations, and replaying
    gameplay) require the ability to seed the PRNG into a known state,
    which a CSPRNG lacks by design.
-   Another major use of the random module is for simple "guess a
    number" games written by beginners, and many people are loath to
    make any change to the random module which may make that harder.
-   Although there is no proposal to remove MT from the random module,
    there was considerable hostility to the idea of having to opt-in to
    a non-CSPRNG or any backwards-incompatible changes.
-   Demonstrated attacks against MT are typically against PHP
    applications. It is believed that PHP's version of MT is a
    significantly softer target than Python's version, due to a poor
    seeding technique[17]. Consequently, without a proven attack against
    Python applications, many people object to a backwards-incompatible
    change.

Alyssa Coghlan made an earlier suggestion <504> for a globally
configurable PRNG which uses the system CSPRNG by default, but has since
withdrawn it in favour of this proposal.

Comparison To Other Languages

-   PHP

    PHP includes a function uniqid[18] which by default returns a
    thirteen character string based on the current time in microseconds.
    Translated into Python syntax, it has the following signature:

        def uniqid(prefix='', more_entropy=False)->str

    The PHP documentation warns that this function is not suitable for
    security purposes. Nevertheless, various mature, well-known PHP
    applications use it for that purpose (citation needed).

    PHP 5.3 and better also includes a function
    openssl_random_pseudo_bytes [19]. Translated into Python syntax, it
    has roughly the following signature:

        def openssl_random_pseudo_bytes(length:int)->Tuple[str, bool]

    This function returns a pseudo-random string of bytes of the given
    length, and a boolean flag giving whether the string is considered
    cryptographically strong. The PHP manual suggests that returning
    anything but True should be rare except for old or broken platforms.

-   JavaScript

    Based on a rather cursory search[20], there do not appear to be any
    well-known standard functions for producing strong random values in
    JavaScript. Math.random is often used, despite serious weaknesses
    making it unsuitable for cryptographic purposes[21]. In recent years
    the majority of browsers have gained support for
    window.crypto.getRandomValues[22].

    Node.js offers a rich cryptographic module, crypto[23], most of
    which is beyond the scope of this PEP. It does include a single
    function for generating random bytes, crypto.randomBytes.

-   Ruby

    The Ruby standard library includes a module SecureRandom[24] which
    includes the following methods:

    -   base64 - returns a Base64 encoded random string.
    -   hex - returns a random hexadecimal string.
    -   random_bytes - returns a random byte string.
    -   random_number - depending on the argument, returns either a
        random integer in the range(0, n), or a random float between 0.0
        and 1.0.
    -   urlsafe_base64 - returns a random URL-safe Base64 encoded
        string.
    -   uuid - return a version 4 random Universally Unique IDentifier.

What Should Be The Name Of The Module?

There was a proposal to add a "random.safe" submodule, quoting the Zen
of Python "Namespaces are one honking great idea" koan. However, the
author of the Zen, Tim Peters, has come out against this idea[25], and
recommends a top-level module.

In discussion on the python-ideas mailing list so far, the name
"secrets" has received some approval, and no strong opposition.

There is already an existing third-party module with the same name[26],
but it appears to be unused and abandoned.

Frequently Asked Questions

-   Q: Is this a real problem? Surely MT is random enough that nobody
    can predict its output.

    A: The consensus among security professionals is that MT is not safe
    in security contexts. It is not difficult to reconstruct the
    internal state of MT[27][28] and so predict all past and future
    values. There are a number of known, practical attacks on systems
    using MT for randomness[29].

-   Q: Attacks on PHP are one thing, but are there any known attacks on
    Python software?

    A: Yes. There have been vulnerabilities in Zope and Plone at the
    very least. Hanno Schlichting commented[30]:

        "In the context of Plone and Zope a practical attack was
        demonstrated, but I can't find any good non-broken links about
        this anymore.  IIRC Plone generated a random number and exposed
        this on each error page along the lines of 'Sorry, you encountered
        an error, your problem has been filed as <random number>, please
        include this when you contact us'.  This allowed anyone to do large
        numbers of requests to this page and get enough random values to
        reconstruct the MT state.  A couple of security related modules used
        random instead of system random (cookie session ids, password reset
        links, auth token), so the attacker could break all of those."

    Christian Heimes reported this issue to the Zope security team in
    2012[31], there are at least two related CVE vulnerabilities[32],
    and at least one work-around for this issue in Django[33].

-   Q: Is this an alternative to specialist cryptographic software such
    as SSL?

    A: No. This is a "batteries included" solution, not a full-featured
    "nuclear reactor". It is intended to mitigate against some basic
    security errors, not be a solution to all security-related issues.
    To quote Alyssa Coghlan referring to her earlier proposal[34]:

        "...folks really are better off learning to use things like
        cryptography.io for security sensitive software, so this change
        is just about harm mitigation given that it's inevitable that a
        non-trivial proportion of the millions of current and future
        Python developers won't do that."

-   Q: What about a password generator?

    A: The consensus is that the requirements for password generators
    are too variable for it to be a good match for the standard
    library[35]. No password generator will be included in the initial
    release of the module, instead it will be given in the documentation
    as a recipe (à la the recipes in the itertools module)[36].

-   Q: Will secrets use /dev/random (which blocks) or /dev/urandom
    (which doesn't block) on Linux? What about other platforms?

    A: secrets will be based on os.urandom and random.SystemRandom,
    which are interfaces to your operating system's best source of
    cryptographic randomness. On Linux, that may be /dev/urandom[37], on
    Windows it may be CryptGenRandom(), but see the documentation and/or
    source code for the detailed implementation details.

References

Copyright

This document has been placed in the public domain.



  Local Variables: mode: indented-text indent-tabs-mode: nil
  sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

[1] https://mail.python.org/pipermail/python-ideas/2015-September/035820.html

[2] https://docs.python.org/3/library/random.html

[3] As of the date of writing. Also, as Google search terms may be
automatically customised for the user without their knowledge, some
readers may see different results.

[4] http://interactivepython.org/runestone/static/everyday/2013/01/3_password.html

[5] http://stackoverflow.com/questions/3854692/generate-password-in-python

[6] http://stackoverflow.com/questions/3854692/generate-password-in-python/3854766#3854766

[7] https://mail.python.org/pipermail/python-ideas/2015-September/036238.html

[8] At least those who are motivated to read the source code and
documentation.

[9] After considerable discussion, Guido ruled that the module need only
provide randbelow, and not similar functions randrange or randint.
http://code.activestate.com/lists/python-dev/138375/

[10] https://mail.python.org/pipermail/python-ideas/2015-September/036271.html

[11] https://mail.python.org/pipermail/python-ideas/2015-September/036350.html

[12] https://github.com/pyca/cryptography/issues/2347

[13] https://bitbucket.org/sdaprano/secrets

[14] https://mail.python.org/pipermail/python-ideas/2015-September/036517.html
https://mail.python.org/pipermail/python-ideas/2015-September/036515.html

[15] https://mail.python.org/pipermail/python-ideas/2015-September/036474.html

[16] Link needed.

[17] By default PHP seeds the MT PRNG with the time (citation needed),
which is exploitable by attackers, while Python seeds the PRNG with
output from the system CSPRNG, which is believed to be much harder to
exploit.

[18] http://php.net/manual/en/function.uniqid.php

[19] http://php.net/manual/en/function.openssl-random-pseudo-bytes.php

[20] Volunteers and patches are welcome.

[21] http://ifsec.blogspot.fr/2012/05/cross-domain-mathrandom-prediction.html

[22] https://developer.mozilla.org/en-US/docs/Web/API/RandomSource/getRandomValues

[23] https://nodejs.org/api/crypto.html

[24] http://ruby-doc.org/stdlib-2.1.2/libdoc/securerandom/rdoc/SecureRandom.html

[25] https://mail.python.org/pipermail/python-ideas/2015-September/036254.html

[26] https://pypi.python.org/pypi/secrets

[27] https://jazzy.id.au/2010/09/22/cracking_random_number_generators_part_3.html

[28] https://mail.python.org/pipermail/python-ideas/2015-September/036077.html

[29] https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf

[30] Personal communication, 2016-08-24.

[31] https://bugs.launchpad.net/zope2/+bug/1071067

[32] http://www.cvedetails.com/cve/CVE-2012-5508/
http://www.cvedetails.com/cve/CVE-2012-6661/

[33] https://github.com/django/django/commit/1525874238fd705ec17a066291935a9316bd3044

[34] https://mail.python.org/pipermail/python-ideas/2015-September/036157.html

[35] https://mail.python.org/pipermail/python-ideas/2015-September/036476.html
https://mail.python.org/pipermail/python-ideas/2015-September/036478.html

[36] https://mail.python.org/pipermail/python-ideas/2015-September/036488.html

[37] http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/
http://www.2uo.de/myths-about-urandom/