PEP: 757 Title: C API to import-export Python integers Author: Sergey B
Kirpichev <skirpichev@gmail.com>, Victor Stinner <vstinner@python.org>
PEP-Delegate: C API Working Group Discussions-To:
https://discuss.python.org/t/63895 Status: Draft Type: Standards Track
Created: 13-Sep-2024 Python-Version: 3.14 Post-History: 14-Sep-2024

Abstract

Add a new C API to import and export Python integers, int objects:
especially PyLongWriter_Create() and PyLong_Export() functions.

Rationale

Projects such as gmpy2, SAGE and Python-FLINT access directly Python
"internals" (the PyLongObject structure) or use an inefficient temporary
format (hex strings for Python-FLINT) to import and export Python int
objects. The Python int implementation changed in Python 3.12 to add a
tag and "compact values".

In the 3.13 alpha 1 release, the private undocumented !_PyLong_New()
function had been removed, but it is being used by these projects to
import Python integers. The private function has been restored in 3.13
alpha 2.

A public efficient abstraction is needed to interface Python with these
projects without exposing implementation details. It would allow Python
to change its internals without breaking these projects. For example,
implementation for gmpy2 was changed recently for CPython 3.9 and for
CPython 3.12.

Specification

Layout API

Data needed by GMP-like import-export functions.

  Layout of an array of "digits" ("limbs" in the GMP terminology), used
  to represent absolute value for arbitrary precision integers.

  Use PyLong_GetNativeLayout to get the native layout of Python int
  objects, used internally for integers with "big enough" absolute
  value.

  See also sys.int_info which exposes similar information to Python.

    Bits per digit.

    Digit size in bytes.

    Digits order:

    -   1 for most significant digit first
    -   -1 for least significant digit first

    Digit endianness:

    -   1 for most significant byte first (big endian)
    -   -1 for least significant first (little endian)

  Get the native layout of Python int objects.

  See the PyLongLayout structure.

  The function must not be called before Python initialization nor after
  Python finalization. The returned layout is valid until Python is
  finalized. The layout is the same for all Python sub-interpreters and
  so it can be cached.

Export API

  Export of a Python int object.

  There are two cases:

  -   If digits is NULL, only use the value member. Calling
      PyLong_FreeExport is optional in this case.
  -   If digits is not NULL, use negative, ndigits and digits members.
      Calling PyLong_FreeExport is mandatory in this case.

    The native integer value of the exported int object. Only valid if
    digits is NULL.

    1 if the number is negative, 0 otherwise. Only valid if digits is
    not NULL.

    Number of digits in digits array. Only valid if digits is not NULL.

    Read-only array of unsigned digits. Can be NULL.

  If digits not NULL, a private field of the PyLongExport structure
  stores a strong reference to the Python int object to make sure that
  that structure remains valid until PyLong_FreeExport() is called.

  Export a Python int object.

  On success, set *export_long and return 0. On error, set an exception
  and return -1.

  If export_long->digits is not NULL, PyLong_FreeExport must be called
  when the export is no longer needed.

On CPython 3.14, no memory copy is needed in PyLong_Export, it's just a
thin wrapper to expose Python int internal digits array.

  Release the export export_long created by PyLong_Export.

Import API

The PyLongWriter API can be used to import an integer: create a Python
int object from a digits array.

  A Python int writer instance.

  The instance must be destroyed by PyLongWriter_Finish or
  PyLongWriter_Discard.

  Create a PyLongWriter.

  On success, set *digits and return a writer. On error, set an
  exception and return NULL.

  negative is 1 if the number is negative, or 0 otherwise.

  ndigits is the number of digits in the digits array. It must be
  greater than or equal to 0.

  The caller can either initialize the array of digits digits and then
  call PyLongWriter_Finish to get a Python int, or call
  PyLongWriter_Discard to destroy the writer instance. Digits must be in
  the range [0; (1 << sys.int_info.bits_per_digit) - 1]. Unused digits
  must be set to 0.

On CPython 3.14, the PyLongWriter_Create implementation is a thin
wrapper to the private !_PyLong_New() function.

  Finish a PyLongWriter created by PyLongWriter_Create.

  On success, return a Python int object. On error, set an exception and
  return NULL.

  The function takes care of normalizing the digits and converts the
  object to a compact integer if needed.

  Discard a PyLongWriter created by PyLongWriter_Create.

Optimize import for small integers

Proposed import API is efficient for large integers. Compared to
accessing directly Python internals, the proposed import API can have a
significant performance overhead on small integers.

For small integers of a few digits (for example, 1 or 2 digits),
existing APIs can be used:

-   PyLong_FromUInt64();
-   PyLong_FromLong();
-   PyLong_FromNativeBytes().

Implementation

-   CPython:
    -   https://github.com/python/cpython/pull/121339
    -   https://github.com/vstinner/cpython/pull/5
-   gmpy:
    -   https://github.com/aleaxit/gmpy/pull/495

Benchmarks

Code:

    /* Query parameters of Python’s internal representation of integers. */
    const PyLongLayout *layout = PyLong_GetNativeLayout();

    size_t int_digit_size = layout->digit_size;
    int int_digits_order = layout->digits_order;
    size_t int_bits_per_digit = layout->bits_per_digit;
    size_t int_nails = int_digit_size*8 - int_bits_per_digit;
    int int_endianness = layout->endianness;

Export: PyLong_Export() with gmpy2

Code:

    static int
    mpz_set_PyLong(mpz_t z, PyObject *obj)
    {
        static PyLongExport long_export;

        if (PyLong_Export(obj, &long_export) < 0) {
            return -1;
        }

        if (long_export.digits) {
            mpz_import(z, long_export.ndigits, int_digits_order, int_digit_size,
                       int_endianness, int_nails, long_export.digits);
            if (long_export.negative) {
                mpz_neg(z, z);
            }
            PyLong_FreeExport(&long_export);
        }
        else {
            const int64_t value = long_export.value;

            if (LONG_MIN <= value && value <= LONG_MAX) {
                mpz_set_si(z, value);
            }
            else {
                mpz_import(z, 1, -1, sizeof(int64_t), 0, 0, &value);
                if (value < 0) {
                    mpz_t tmp;
                    mpz_init(tmp);
                    mpz_ui_pow_ui(tmp, 2, 64);
                    mpz_sub(z, z, tmp);
                    mpz_clear(tmp);
                }
            }
        }
        return 0;
    }

Reference code: mpz_set_PyLong() in the gmpy2 master for commit 9177648.

Benchmark:

    import pyperf
    from gmpy2 import mpz

    runner = pyperf.Runner()
    runner.bench_func('1<<7', mpz, 1 << 7)
    runner.bench_func('1<<38', mpz, 1 << 38)
    runner.bench_func('1<<300', mpz, 1 << 300)
    runner.bench_func('1<<3000', mpz, 1 << 3000)

Results on Linux Fedora 40 with CPU isolation, Python built in release
mode:

  --------------------------------------------------
  Benchmark        ref       pep757
  ---------------- --------- -----------------------
  1<<7             91.3 ns   89.9 ns: 1.02x faster

  1<<38            120 ns    94.9 ns: 1.27x faster

  1<<300           196 ns    203 ns: 1.04x slower

  1<<3000          939 ns    945 ns: 1.01x slower

  Geometric mean   (ref)     1.05x faster
  --------------------------------------------------

Import: PyLongWriter_Create() with gmpy2

Code:

    static PyObject *
    GMPy_PyLong_From_MPZ(MPZ_Object *obj, CTXT_Object *context)
    {
        if (mpz_fits_slong_p(obj->z)) {
            return PyLong_FromLong(mpz_get_si(obj->z));
        }

        size_t size = (mpz_sizeinbase(obj->z, 2) +
                       int_bits_per_digit - 1) / int_bits_per_digit;
        void *digits;
        PyLongWriter *writer = PyLongWriter_Create(mpz_sgn(obj->z) < 0, size,
                                                   &digits);
        if (writer == NULL) {
            return NULL;
        }

        mpz_export(digits, NULL, int_digits_order, int_digit_size,
                   int_endianness, int_nails, obj->z);

        return PyLongWriter_Finish(writer);
    }

Reference code: GMPy_PyLong_From_MPZ() in the gmpy2 master for commit
9177648.

Benchmark:

    import pyperf
    from gmpy2 import mpz

    runner = pyperf.Runner()
    runner.bench_func('1<<7', int, mpz(1 << 7))
    runner.bench_func('1<<38', int, mpz(1 << 38))
    runner.bench_func('1<<300', int, mpz(1 << 300))
    runner.bench_func('1<<3000', int, mpz(1 << 3000))

Results on Linux Fedora 40 with CPU isolation, Python built in release
mode:

  --------------------------------------------------
  Benchmark        ref       pep757
  ---------------- --------- -----------------------
  1<<7             56.7 ns   56.2 ns: 1.01x faster

  1<<300           191 ns    213 ns: 1.12x slower

  Geometric mean   (ref)     1.03x slower
  --------------------------------------------------

Benchmark hidden because not significant (2): 1<<38, 1<<3000.

Backwards Compatibility

There is no impact on the backward compatibility, only new APIs are
added.

Rejected Ideas

Support arbitrary layout

It would be convenient to support arbitrary layout to import-export
Python integers.

For example, it was proposed to add a layout parameter to
PyLongWriter_Create() and a layout member to the PyLongExport structure.

The problem is that it's more complex to implement and not really
needed. What's strictly needed is only an API to import-export using the
Python "native" layout.

If later there are use cases for arbitrary layouts, new APIs can be
added.

Don't add PyLong_GetNativeLayout function

Currently, most required information for int import/export is already
available via PyLong_GetInfo() (and sys.int_info). We also can add more
(like order of digits), this interface doesn't poses any constraints on
future evolution of the PyLongObject.

The problem is that the PyLong_GetInfo() returns a Python object,
named tuple, not a convenient C structure and that might distract people
from using it in favor e.g. of current semi-private macros like
!PyLong_SHIFT and !PyLong_BASE.

Provide mpz_import/export-like API instead

The other approach to import/export data from int objects might be
following: expect, that C extensions provide contiguous buffers that
CPython then exports (or imports) the absolute value of an integer.

API example:

    struct PyLongLayout {
        uint8_t bits_per_digit;
        uint8_t digit_size;
        int8_t digits_order;
    };

    size_t PyLong_GetDigitsNeeded(PyLongObject *obj, PyLongLayout layout);
    int PyLong_Export(PyLongObject *obj, PyLongLayout layout, void *buffer);
    PyLongObject *PyLong_Import(PyLongLayout layout, void *buffer);

This might work for the GMP, as this it has !mpz_limbs_read() and
!mpz_limbs_write() functions, that can provide required "buffers".

The major drawback of this approach is that it's much more complex on
the CPython side (i.e. actual conversion between different layouts).

Discussions

-   Discourse: PEP 757 – C API to import-export Python integers
-   C API Working Group decision issue #35
-   Pull request #121339
-   Issue #102471: The C-API for Python to C integer conversion is, to
    be frank, a mess.
-   Add public function PyLong_GetDigits()
-   Consider restoring _PyLong_New() function as public
-   Pull request gh-106320: Remove private _PyLong_New() function.

Copyright

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.