WIP: add support to bigger ints (256 bit) #1552

bettio · 2025-02-24T14:05:07Z

This PR introduces support to signed 256 bit integers.

Briefly:

Add code for generic big integer handling (intn.c), limited to 256 bit for the sake of simplicity, but it might be expanded to generic big integers with some additional work
Refactored arithmetic BIF helpers
Refactored integer_to_binary/_to_list in order to reduce both code duplication and make it simpler adding big integers support
Reimplemented binary_to_integer in to make it compliant with OTP (binaries such as <<"0xCAFE">> or <<" 42">> must be rejected).
Replaced lltoa with more performant functions that do not rely on slow helpers, specially for base 10 and 16

Notes:

Big integers implementation is quite decoupled from term.h, so functions in term.h handle them as opaques storing bytes
In our implementation Value bor 0x80000000000000000000000000000000 will cause a sign flip, unlike OTP

Todos:

Big integer literals, without them 0xCAFE1234CAFECAFE = calculate() cat not work)
term_to_binary and binary_to_term
binary_to_integer and list_to_integer

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

pguyot · 2025-03-10T22:19:32Z

src/libAtomVM/bif.c

+    *rounded_num_len = rounded / sizeof(intn_digit_t);
+}
+
+static term make_bigint(Context *ctx, uint32_t fail_label, uint32_t live,


Is it bigint or big_int?

I'm using bigint now.

pguyot · 2025-03-10T22:23:54Z

src/libAtomVM/bif.c

+    }
+}
+
+static void pair_to_big_int(term arg1, term arg2, intn_digit_t *tmp_buf1, intn_digit_t *tmp_buf2,


I don't understand why this exists as a function. Nothing seems shared between each pair member.

All binary operations (add, mul, div, etc...) have the same repeated pattern, so I was factoring it.
Honestly I think I should factor in a macro also this part before that is going to be repeated:

698 intn_digit_t tmp_buf1[INTN_INT64_LEN]; 699 intn_digit_t tmp_buf2[INTN_INT64_LEN]; 700 701 intn_digit_t *bn1; 702 size_t bn1_len; 703 intn_digit_t *bn2; 704 size_t bn2_len;

Anyway that function is required to handle all the combos such as: small-int * big-int, big-int * small-int and big-int * big-int.

src/libAtomVM/intn.h

pguyot · 2025-03-10T22:26:22Z

src/libAtomVM/intn.h

+#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+            ret = (((uint64_t) num[1] << 32) | (uint64_t) num[0]);
+#else
+#error "Unsupporoted endianess"


Unsupported

pguyot · 2025-03-10T22:33:47Z

src/libAtomVM/intn.c

+#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+#error "Big endian not yet supported"
+#else
+#error "Unsupproted endiness"


Unsupported endianness

pguyot · 2025-03-10T22:46:16Z

tests/erlang_tests/bigint.erl

+    Expected_INT64_MIN = ?MODULE:shrink(),
+    A = ?MODULE:mul(16#10101010CAFECAFE, 16#AABB),
+    Square = ?MODULE:mul(A, A),
+    <<"2559181265480533323615999200984578944503596644">> = erlang:integer_to_binary(Square),


2559181265480533323615999200984578944503596644 = Square,

We also need support for literal decoding.

Indeed, we are getting there step by step.

pguyot · 2025-03-10T22:46:49Z

tests/erlang_tests/bigint.erl

+start() ->
+    test_mul().
+
+test_mul() ->


This isn't only about mul, the function could be split into several.

I'm still not sure about this comment, can you explain further?

pguyot · 2025-03-10T22:51:01Z

src/libAtomVM/nifs.c

@@ -2184,42 +2185,11 @@ static term nif_erlang_atom_to_list_1(Context *ctx, int argc, term argv[])
    return ret;
 }

-static size_t lltoa(avm_int64_t int_value, unsigned base, char *integer_string)
+static term integer_to_buf(Context *ctx, int argc, term argv[], char *tmp_buf, size_t tmp_buf_size,


I am confused about the signature, returned value is [] or invalid_term, isn't it?
Couldn't we return just a bool?

ugly signature indeed, that function raises different kinds of errors for both integer_to_binary and integer_to_list

pguyot · 2025-03-10T22:52:11Z

src/libAtomVM/nifs.c

+    bool needs_cleanup;
+    term maybe_fail_ret
+        = integer_to_buf(ctx, argc, argv, tmp_buf, tmp_buf_size, &int_buf, &int_len, &needs_cleanup);
+    if (UNLIKELY(term_is_invalid_term(maybe_fail_ret))) {


Since maybe_fail_ret is not used, UNLIKELY could test the returned value of integer_to_buf directly.

It is actually used:

2254 if (UNLIKELY(term_is_invalid_term(maybe_fail_ret))) { 2255 return maybe_fail_ret; 2256 }

Sorry for the confusion. What I mean is that we only test result being invalid term or not. When it's invalid term we return invalid term. In this respect we don't really use this value.

oh, right, got it.

pguyot · 2025-03-10T22:53:50Z

src/libAtomVM/intn.c

+#include <stdlib.h>
+#include <string.h>
+
+#define USE_64BIT_MUL


Do we have a compile-time test for this?

We need to test different combos indeed.

pguyot · 2025-03-11T06:50:50Z

Aren't mbedtls_mpi_* functions linked into AtomVM if SSL is enabled? By default I think they are 4096 bits. I would favor these (and I also have a branch where I vendor crypto functions as we discussed in Fosdem for wasm).

bettio · 2025-03-11T18:30:43Z

Aren't mbedtls_mpi_* functions linked into AtomVM if SSL is enabled? By default I think they are 4096 bits. I would favor these (and I also have a branch where I vendor crypto functions as we discussed in Fosdem for wasm).

Yes, they are, but there are is some additional complexity.

mbedtls mpi is focused on crypto operations, so it lacks all the bitwise operations we need. Also integer bytes are private, so we are not supposed to access them for our operations without serializing them back and forth.
all functions are supposed to work on data that is stored on the heap (and it has its own memory management), so we need to handle each number as a resource, unless we do the same serialization work back and forth as for 1.
if we vendor it, we should remove all the additional crypto functions we don't use (e.g mbedtls_mpi_exp_mod)
it cannot be vendored by just taking that source file or a single function, more work has to be done in order to take their mempool and other required bits / cleaning up their sources from their macros and defines.

So we can use it, but it is not that trivial, and I'm not sure it is worth it.

As an additional note, in our implementation the most complex part are div and mul algorithms, that have been copied from Hacker's delight, and based on Knuth work. They are pretty efficient and well tested.

bettio · 2025-03-29T13:09:20Z

Noteworthy additional changes:

(NIFs: refactor binary_to_integer/1) bettio@5006d3d

The binary_to_integer function was not compliant with OTP behavior, it was accepting trailing spaces, and base prefixes (such as 0x).
This implementation has same behavior of OTP one, and also tries to optimize base 10 and base 16 cases.
It also avoids expensive int64_t multiplications on 32 hardware, as much as possible.

(NIFs: refactor integer_to_binary/list) bettio@a68cf2f

It replaces our implementation, with an optimized one.
Having a specialized case for base 10 allows the compiler to replace division by 10 with reciprocal mul.

nlz function is used from `divmnu` function. Use compiler builtin when available instead of C implementation. Signed-off-by: Davide Bettio <davide@uninstall.it>

This function is a first round of integration with intn bigint implementation. - Allow printing big integers with `erlang:display/1` and in general with `term_display` functions. - Allow converting big integers to binaries and lists using `erlang:integer_to_binary/1` and `erlang:integer_to_list/1`. Signed-off-by: Davide Bettio <davide@uninstall.it>

Implement a first arithmetic operation that uses `intn_mulmns` in order to validate the whole approach. Signed-off-by: Davide Bettio <davide@uninstall.it>

Add first iteration on bigint tests, starting with tests for `erlang:*/2` and `integer_to_binary/2`. Signed-off-by: Davide Bettio <davide@uninstall.it>

Just use `intn_parse` function. Signed-off-by: Davide Bettio <davide@uninstall.it>

Negative boxed integers have 3rd bit set (b1s00). Also introduce new defines: - TERM_BOXED_NEGATIVE_INTEGER - TERM_BOXED_INTEGER_SIGN_BIT - TERM_BOXED_INTEGER_SIGN_BIT_POS Signed-off-by: Davide Bettio <davide@uninstall.it>

Add functions for checking if a term is a positive integer, and etc... Function names are inspired to Erlang typespecs (such as non_neg_integer). Signed-off-by: Davide Bettio <davide@uninstall.it>

Start moving existing code to predicates such as `term_is_any_non_neg_integer(t)`. Signed-off-by: Davide Bettio <davide@uninstall.it>

Some operations in 2-complement turns to be quite complex, they require more code, and also more stack space for storing abs value. Hence using a dedicated sign bit (as Erlang does) turns to be an easier and pragmatic approach. This approach makes possible having sign bit outside the numeric payload, so the supported range is -(2^256 - 1)..+(2^256 - 1). They might be called int257, but it would be quite confusing. Sign bit is stored in boxed header, outside of the numeric payload. Also add a valgrind supression file, in order to ignore a bogus warning about overlapping memory in memcpy when executing memmove (that allows overlapping memory). Signed-off-by: Davide Bettio <davide@uninstall.it>

On 32-bit systems, use `make_maybe_boxed_int64` in `neg_boxed_helper` since `-(INT32_MAX + 1)` is `INT32_MIN` that fits into a 32-bit boxed integer. Before of this change `make_boxed_int64` was used, making a 64-bit boxed integer for an int32 value. New `term_compare` implementation will check size and sign metadata before performing any actual comparison, so all value must be in their "minimal canonical form". Signed-off-by: Davide Bettio <davide@uninstall.it>

Refactor term_compare to use metadata such as size and sign before performing any integer comparison (that might be expensive for big integers). Perform digit by digit comparison for big integers only when size and sign are equal. Signed-off-by: Davide Bettio <davide@uninstall.it>

Add functions that do not rely on undefined behavior for converting unsigned to signed negative integers (and viceversa), for checking if conversion overflows and for conditionally negate. Start using newly introduced utilities in both intn and externalterm (an old macro is removed). Signed-off-by: Davide Bettio <davide@uninstall.it>

Refactor term_conv_to_float in order to use intn_to_double function. Also make sure in opcodesswitch that term_conv_to_float() returns a finite value. Signed-off-by: Davide Bettio <davide@uninstall.it>

Add helper function float_to_integer_helper, that checks the float result of functions such as floor, round, trunc, etc... instead of the arguments in advance. Furthermore a better upper and limit for safe double to int64 conversion has been found. Signed-off-by: Davide Bettio <davide@uninstall.it>

Allow functions such as trunc, round, etc... to return a big integer, when a number above 2^63 or below -2^63 is given. Signed-off-by: Davide Bettio <davide@uninstall.it>

Test for overflows and for not-yet-overflowed values, when converting from float to big int. Also test comparison between floats and big ints. Signed-off-by: Davide Bettio <davide@uninstall.it>

This function converts n-bytes in either big or little endian format, signed / unsigned, and converts them into a intn integer. Signed-off-by: Davide Bettio <davide@uninstall.it>

This function is required in any place a new bigint term needs to be created. Also rename it to `term_intn_to_term_size`. Signed-off-by: Davide Bettio <davide@uninstall.it>

Add support in SMALL_BIG_EXT parsing to big integers (that means integers that are >= 8 bytes and <= 32 bytes). Signed-off-by: Davide Bettio <davide@uninstall.it>

Allow literals bigger than 64 bit, such as: 16#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF As a side note, bigger literals than (2^256 - 1) are encoded as external terms. Signed-off-by: Davide Bettio <davide@uninstall.it>

Add functions useful for writing a big integer back to a buffer, as a little/big-endian integer. Signed-off-by: Davide Bettio <davide@uninstall.it>

Allow calling term_to_binary for serializing big integers. Signed-off-by: Davide Bettio <davide@uninstall.it>

bettio force-pushed the biggerint branch 6 times, most recently from e9902c0 to 0e71d98 Compare March 7, 2025 21:36

bettio force-pushed the biggerint branch 7 times, most recently from 5dfab3f to 1afe75b Compare March 9, 2025 16:12

bettio changed the title ~~WIP: bigger int~~ WIP: add support to bigger ints (255 bit) Mar 9, 2025

bettio changed the title ~~WIP: add support to bigger ints (255 bit)~~ WIP: add support to bigger ints (256 bit) Mar 9, 2025

bettio force-pushed the biggerint branch from f507077 to 606a349 Compare March 9, 2025 23:36

pguyot reviewed Mar 10, 2025

View reviewed changes

bettio force-pushed the biggerint branch from 606a349 to 065a7a4 Compare March 11, 2025 23:00

bettio force-pushed the biggerint branch 6 times, most recently from 6de1d25 to 3bb28df Compare March 29, 2025 12:52

bettio force-pushed the biggerint branch 3 times, most recently from ba9e3a3 to fe978bf Compare March 30, 2025 18:36

intn: optimize nlz function

9a43653

nlz function is used from `divmnu` function. Use compiler builtin when available instead of C implementation. Signed-off-by: Davide Bettio <davide@uninstall.it>

bettio force-pushed the biggerint branch 6 times, most recently from 559f3fc to 8aa1058 Compare April 24, 2025 23:23

bettio added 17 commits April 29, 2025 13:26

BIFs: implement first bigint operation (erlang:*/2)

1a185ca

Implement a first arithmetic operation that uses `intn_mulmns` in order to validate the whole approach. Signed-off-by: Davide Bettio <davide@uninstall.it>

tests: add first bigint tests (bigint.erl)

801f71f

Add first iteration on bigint tests, starting with tests for `erlang:*/2` and `integer_to_binary/2`. Signed-off-by: Davide Bettio <davide@uninstall.it>

Add support to bigint to erlang:binary_to_integer/1

f43de27

Just use `intn_parse` function. Signed-off-by: Davide Bettio <davide@uninstall.it>

term: use boxed integer sign bit

eacdc62

Negative boxed integers have 3rd bit set (b1s00). Also introduce new defines: - TERM_BOXED_NEGATIVE_INTEGER - TERM_BOXED_INTEGER_SIGN_BIT - TERM_BOXED_INTEGER_SIGN_BIT_POS Signed-off-by: Davide Bettio <davide@uninstall.it>

term: add integer sign predicates and getter

d16b5af

Add functions for checking if a term is a positive integer, and etc... Function names are inspired to Erlang typespecs (such as non_neg_integer). Signed-off-by: Davide Bettio <davide@uninstall.it>

BIFs: erlang:is_function/2 use new term_is_any_non_neg_integer

406c564

Start moving existing code to predicates such as `term_is_any_non_neg_integer(t)`. Signed-off-by: Davide Bettio <davide@uninstall.it>

Add big integer to double conversion

b124e0b

Refactor term_conv_to_float in order to use intn_to_double function. Also make sure in opcodesswitch that term_conv_to_float() returns a finite value. Signed-off-by: Davide Bettio <davide@uninstall.it>

BIFs: implement float to big integer support

4e5755d

Allow functions such as trunc, round, etc... to return a big integer, when a number above 2^63 or below -2^63 is given. Signed-off-by: Davide Bettio <davide@uninstall.it>

tests: bigint.erl: test limits around +-(2^256 - 1)

f916c11

Test for overflows and for not-yet-overflowed values, when converting from float to big int. Also test comparison between floats and big ints. Signed-off-by: Davide Bettio <davide@uninstall.it>

intn: add intn_from_integer_bytes function

2766c05

This function converts n-bytes in either big or little endian format, signed / unsigned, and converts them into a intn integer. Signed-off-by: Davide Bettio <davide@uninstall.it>

move intn_to_term_size to term.h

2ae2884

This function is required in any place a new bigint term needs to be created. Also rename it to `term_intn_to_term_size`. Signed-off-by: Davide Bettio <davide@uninstall.it>

bettio force-pushed the biggerint branch from eee1c15 to 8aaf82e Compare April 29, 2025 11:46

bettio added 4 commits April 29, 2025 14:06

externalterm: parse big integers

efe5057

Add support in SMALL_BIG_EXT parsing to big integers (that means integers that are >= 8 bytes and <= 32 bytes). Signed-off-by: Davide Bettio <davide@uninstall.it>

intn: add intn_to_integer_bytes and intn_required_unsigned_integer_bytes

48ed444

Add functions useful for writing a big integer back to a buffer, as a little/big-endian integer. Signed-off-by: Davide Bettio <davide@uninstall.it>

externalterm: encode big integers as SMALL_BIG_EXT

18e7a0b

Allow calling term_to_binary for serializing big integers. Signed-off-by: Davide Bettio <davide@uninstall.it>

bettio force-pushed the biggerint branch from 8aaf82e to 18e7a0b Compare April 29, 2025 12:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: add support to bigger ints (256 bit) #1552

WIP: add support to bigger ints (256 bit) #1552

bettio commented Feb 24, 2025 •

edited

Loading

pguyot Mar 10, 2025

bettio Mar 11, 2025 •

edited

Loading

pguyot Mar 10, 2025

bettio Mar 11, 2025 •

edited

Loading

pguyot Mar 10, 2025

bettio Mar 11, 2025

pguyot Mar 10, 2025

bettio Mar 11, 2025

pguyot Mar 10, 2025

bettio Mar 11, 2025

pguyot Mar 10, 2025

bettio Mar 11, 2025

pguyot Mar 10, 2025

bettio Mar 11, 2025

pguyot Mar 10, 2025

bettio Mar 11, 2025

pguyot Mar 11, 2025

bettio Mar 11, 2025

pguyot Mar 10, 2025

bettio Mar 11, 2025

pguyot commented Mar 11, 2025 •

edited

Loading

bettio commented Mar 11, 2025

bettio commented Mar 29, 2025

WIP: add support to bigger ints (256 bit) #1552

Are you sure you want to change the base?

WIP: add support to bigger ints (256 bit) #1552

Conversation

bettio commented Feb 24, 2025 • edited Loading

Choose a reason for hiding this comment

bettio Mar 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bettio Mar 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pguyot commented Mar 11, 2025 • edited Loading

bettio commented Mar 11, 2025

bettio commented Mar 29, 2025

bettio commented Feb 24, 2025 •

edited

Loading

bettio Mar 11, 2025 •

edited

Loading

bettio Mar 11, 2025 •

edited

Loading

pguyot commented Mar 11, 2025 •

edited

Loading