PEP: 452 Title: API for Cryptographic Hash Functions v2.0 Version:
$Revision$ Last-Modified: $Date$ Author: A.M. Kuchling <amk@amk.ca>,
Christian Heimes <christian@python.org> Status: Final Type:
Informational Content-Type: text/x-rst Created: 15-Aug-2013
Post-History: Replaces: 247

Abstract

There are several different modules available that implement
cryptographic hashing algorithms such as MD5 or SHA. This document
specifies a standard API for such algorithms, to make it easier to
switch between different implementations.

Specification

All hashing modules should present the same interface. Additional
methods or variables can be added, but those described in this document
should always be present.

Hash function modules define one function:

new([string])            (unkeyed hashes)

new(key, [string], [digestmod])    (keyed hashes)

    Create a new hashing object and return it. The first form is for
    hashes that are unkeyed, such as MD5 or SHA. For keyed hashes such
    as HMAC, 'key' is a required parameter containing a string giving
    the key to use. In both cases, the optional 'string' parameter, if
    supplied, will be immediately hashed into the object's starting
    state, as if obj.update(string) was called.

    After creating a hashing object, arbitrary bytes can be fed into the
    object using its update() method, and the hash value can be obtained
    at any time by calling the object's digest() method.

    Although the parameter is called 'string', hashing objects operate
    on 8-bit data only. Both 'key' and 'string' must be a bytes-like
    object (bytes, bytearray...). A hashing object may support
    one-dimensional, contiguous buffers as argument, too. Text (unicode)
    is no longer supported in Python 3.x. Python 2.x implementations may
    take ASCII-only unicode as argument, but portable code should not
    rely on the feature.

    Arbitrary additional keyword arguments can be added to this
    function, but if they're not supplied, sensible default values
    should be used. For example, 'rounds' and 'digest_size' keywords
    could be added for a hash function which supports a variable number
    of rounds and several different output sizes, and they should
    default to values believed to be secure.

Hash function modules define one variable:

digest_size

    An integer value; the size of the digest produced by the hashing
    objects created by this module, measured in bytes. You could also
    obtain this value by creating a sample object and accessing its
    'digest_size' attribute, but it can be convenient to have this value
    available from the module. Hashes with a variable output size will
    set this variable to None.

Hashing objects require the following attribute:

digest_size

    This attribute is identical to the module-level digest_size
    variable, measuring the size of the digest produced by the hashing
    object, measured in bytes. If the hash has a variable output size,
    this output size must be chosen when the hashing object is created,
    and this attribute must contain the selected size. Therefore, None
    is not a legal value for this attribute.

block_size

    An integer value or NotImplemented; the internal block size of the
    hash algorithm in bytes. The block size is used by the HMAC module
    to pad the secret key to digest_size or to hash the secret key if it
    is longer than digest_size. If no HMAC algorithm is standardized for
    the hash algorithm, return NotImplemented instead.

name

    A text string value; the canonical, lowercase name of the hashing
    algorithm. The name should be a suitable parameter for hashlib.new.

Hashing objects require the following methods:

copy()

    Return a separate copy of this hashing object. An update to this
    copy won't affect the original object.

digest()

    Return the hash value of this hashing object as a bytes containing
    8-bit data. The object is not altered in any way by this function;
    you can continue updating the object after calling this function.

hexdigest()

    Return the hash value of this hashing object as a string containing
    hexadecimal digits. Lowercase letters should be used for the digits
    'a' through 'f'. Like the .digest() method, this method mustn't
    alter the object.

update(string)

    Hash bytes-like 'string' into the current state of the hashing
    object. update() can be called any number of times during a hashing
    object's lifetime.

Hashing modules can define additional module-level functions or object
methods and still be compliant with this specification.

Here's an example, using a module named 'MD5':

    >>> import hashlib
    >>> from Crypto.Hash import MD5
    >>> m = MD5.new()
    >>> isinstance(m, hashlib.CryptoHash)
    True
    >>> m.name
    'md5'
    >>> m.digest_size
    16
    >>> m.block_size
    64
    >>> m.update(b'abc')
    >>> m.digest()
    b'\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr'
    >>> m.hexdigest()
    '900150983cd24fb0d6963f7d28e17f72'
    >>> MD5.new(b'abc').digest()
    b'\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr'

Rationale

The digest size is measured in bytes, not bits, even though hash
algorithm sizes are usually quoted in bits; MD5 is a 128-bit algorithm
and not a 16-byte one, for example. This is because, in the sample code
I looked at, the length in bytes is often needed (to seek ahead or
behind in a file; to compute the length of an output string) while the
length in bits is rarely used. Therefore, the burden will fall on the
few people actually needing the size in bits, who will have to multiply
digest_size by 8.

It's been suggested that the update() method would be better named
append(). However, that method is really causing the current state of
the hashing object to be updated, and update() is already used by the
md5 and sha modules included with Python, so it seems simplest to leave
the name update() alone.

The order of the constructor's arguments for keyed hashes was a sticky
issue. It wasn't clear whether the key should come first or second. It's
a required parameter, and the usual convention is to place required
parameters first, but that also means that the 'string' parameter moves
from the first position to the second. It would be possible to get
confused and pass a single argument to a keyed hash, thinking that
you're passing an initial string to an unkeyed hash, but it doesn't seem
worth making the interface for keyed hashes more obscure to avoid this
potential error.

Changes from Version 1.0 to Version 2.0

Version 2.0 of API for Cryptographic Hash Functions clarifies some
aspects of the API and brings it up-to-date. It also formalized aspects
that were already de facto standards and provided by most
implementations.

Version 2.0 introduces the following new attributes:

name

    The name property was made mandatory by issue 18532.

block_size

    The new version also specifies that the return value NotImplemented
    prevents HMAC support.

Version 2.0 takes the separation of binary and text data in Python 3.0
into account. The 'string' argument to new() and update() as well as the
'key' argument must be bytes-like objects. On Python 2.x a hashing
object may also support ASCII-only unicode. The actual name of argument
is not changed as it is part of the public API. Code may depend on the
fact that the argument is called 'string'.

Recommended names for common hashing algorithms

+------------+------------+-------------------+
| algorithm  |   variant  |   recommended     |
|            |            |   name            |
+============+============+===================+
| MD5        |            |   md5             |
+------------+------------+-------------------+
| RIPEMD-160 |            |   ripemd160       |
+------------+------------+-------------------+
| SHA-1      |            |   sha1            |
+------------+------------+-------------------+
| SHA-2      |   SHA-224  |   sha224          |
+------------+------------+-------------------+
|            |   SHA-256  |   sha256          |
+------------+------------+-------------------+
|            |   SHA-384  |   sha384          |
+------------+------------+-------------------+
|            |   SHA-512  |   sha512          |
+------------+------------+-------------------+
| SHA-3      |            |   sha3_224        |
|            |  SHA-3-224 |                   |
+------------+------------+-------------------+
|            |            |   sha3_256        |
|            |  SHA-3-256 |                   |
+------------+------------+-------------------+
|            |            |   sha3_384        |
|            |  SHA-3-384 |                   |
+------------+------------+-------------------+
|            |            |   sha3_512        |
|            |  SHA-3-512 |                   |
+------------+------------+-------------------+
| WHIRLPOOL  |            |   whirlpool       |
+------------+------------+-------------------+

Changes

-   2001-09-17: Renamed clear() to reset(); added digest_size attribute
    to objects; added .hexdigest() method.
-   2001-09-20: Removed reset() method completely.
-   2001-09-28: Set digest_size to None for variable-size hashes.
-   2013-08-15: Added block_size and name attributes; clarified that
    'string' actually refers to bytes-like objects.

Acknowledgements

Thanks to Aahz, Andrew Archibald, Rich Salz, Itamar Shtull-Trauring, and
the readers of the python-crypto list for their comments on this PEP.

Copyright

This document has been placed in the public domain.