Skip to content

WIP: add support to bigger ints (256 bit) #1552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from
Draft

Conversation

bettio
Copy link
Collaborator

@bettio bettio commented Feb 24, 2025

This PR introduces support to signed 256 bit integers.

Briefly:

  • Add code for generic big integer handling (intn.c), limited to 256 bit for the sake of simplicity, but it might be expanded to generic big integers with some additional work
  • Refactored arithmetic BIF helpers
  • Refactored integer_to_binary/_to_list in order to reduce both code duplication and make it simpler adding big integers support
  • Reimplemented binary_to_integer in to make it compliant with OTP (binaries such as <<"0xCAFE">> or <<" 42">> must be rejected).
  • Replaced lltoa with more performant functions that do not rely on slow helpers, specially for base 10 and 16

Notes:

  • Big integers implementation is quite decoupled from term.h, so functions in term.h handle them as opaques storing bytes
  • In our implementation Value bor 0x80000000000000000000000000000000 will cause a sign flip, unlike OTP

Todos:

  • Big integer literals, without them 0xCAFE1234CAFECAFE = calculate() cat not work)
  • term_to_binary and binary_to_term
  • binary_to_integer and list_to_integer

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

@bettio bettio force-pushed the biggerint branch 6 times, most recently from e9902c0 to 0e71d98 Compare March 7, 2025 21:36
@bettio bettio force-pushed the biggerint branch 7 times, most recently from 5dfab3f to 1afe75b Compare March 9, 2025 16:12
@bettio bettio changed the title WIP: bigger int WIP: add support to bigger ints (255 bit) Mar 9, 2025
@bettio bettio changed the title WIP: add support to bigger ints (255 bit) WIP: add support to bigger ints (256 bit) Mar 9, 2025
*rounded_num_len = rounded / sizeof(intn_digit_t);
}

static term make_bigint(Context *ctx, uint32_t fail_label, uint32_t live,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it bigint or big_int?

Copy link
Collaborator Author

@bettio bettio Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using bigint now.

}
}

static void pair_to_big_int(term arg1, term arg2, intn_digit_t *tmp_buf1, intn_digit_t *tmp_buf2,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this exists as a function. Nothing seems shared between each pair member.

Copy link
Collaborator Author

@bettio bettio Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All binary operations (add, mul, div, etc...) have the same repeated pattern, so I was factoring it.
Honestly I think I should factor in a macro also this part before that is going to be repeated:

 698     intn_digit_t tmp_buf1[INTN_INT64_LEN];
 699     intn_digit_t tmp_buf2[INTN_INT64_LEN];
 700
 701     intn_digit_t *bn1;
 702     size_t bn1_len;
 703     intn_digit_t *bn2;
 704     size_t bn2_len;

Anyway that function is required to handle all the combos such as: small-int * big-int, big-int * small-int and big-int * big-int.

#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
ret = (((uint64_t) num[1] << 32) | (uint64_t) num[0]);
#else
#error "Unsupporoted endianess"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsupported

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
#error "Big endian not yet supported"
#else
#error "Unsupproted endiness"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsupported endianness

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Expected_INT64_MIN = ?MODULE:shrink(),
A = ?MODULE:mul(16#10101010CAFECAFE, 16#AABB),
Square = ?MODULE:mul(A, A),
<<"2559181265480533323615999200984578944503596644">> = erlang:integer_to_binary(Square),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    2559181265480533323615999200984578944503596644 = Square,

We also need support for literal decoding.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, we are getting there step by step.

start() ->
test_mul().

test_mul() ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't only about mul, the function could be split into several.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure about this comment, can you explain further?

@@ -2184,42 +2185,11 @@ static term nif_erlang_atom_to_list_1(Context *ctx, int argc, term argv[])
return ret;
}

static size_t lltoa(avm_int64_t int_value, unsigned base, char *integer_string)
static term integer_to_buf(Context *ctx, int argc, term argv[], char *tmp_buf, size_t tmp_buf_size,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about the signature, returned value is [] or invalid_term, isn't it?
Couldn't we return just a bool?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugly signature indeed, that function raises different kinds of errors for both integer_to_binary and integer_to_list

bool needs_cleanup;
term maybe_fail_ret
= integer_to_buf(ctx, argc, argv, tmp_buf, tmp_buf_size, &int_buf, &int_len, &needs_cleanup);
if (UNLIKELY(term_is_invalid_term(maybe_fail_ret))) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since maybe_fail_ret is not used, UNLIKELY could test the returned value of integer_to_buf directly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually used:

2254     if (UNLIKELY(term_is_invalid_term(maybe_fail_ret))) {
2255         return maybe_fail_ret;
2256     }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion. What I mean is that we only test result being invalid term or not. When it's invalid term we return invalid term. In this respect we don't really use this value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, right, got it.

#include <stdlib.h>
#include <string.h>

#define USE_64BIT_MUL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a compile-time test for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to test different combos indeed.

@pguyot
Copy link
Collaborator

pguyot commented Mar 11, 2025

Aren't mbedtls_mpi_* functions linked into AtomVM if SSL is enabled? By default I think they are 4096 bits. I would favor these (and I also have a branch where I vendor crypto functions as we discussed in Fosdem for wasm).

@bettio
Copy link
Collaborator Author

bettio commented Mar 11, 2025

Aren't mbedtls_mpi_* functions linked into AtomVM if SSL is enabled? By default I think they are 4096 bits. I would favor these (and I also have a branch where I vendor crypto functions as we discussed in Fosdem for wasm).

Yes, they are, but there are is some additional complexity.

  1. mbedtls mpi is focused on crypto operations, so it lacks all the bitwise operations we need. Also integer bytes are private, so we are not supposed to access them for our operations without serializing them back and forth.
  2. all functions are supposed to work on data that is stored on the heap (and it has its own memory management), so we need to handle each number as a resource, unless we do the same serialization work back and forth as for 1.
  3. if we vendor it, we should remove all the additional crypto functions we don't use (e.g mbedtls_mpi_exp_mod)
  4. it cannot be vendored by just taking that source file or a single function, more work has to be done in order to take their mempool and other required bits / cleaning up their sources from their macros and defines.

So we can use it, but it is not that trivial, and I'm not sure it is worth it.

As an additional note, in our implementation the most complex part are div and mul algorithms, that have been copied from Hacker's delight, and based on Knuth work. They are pretty efficient and well tested.

@bettio bettio force-pushed the biggerint branch 6 times, most recently from 6de1d25 to 3bb28df Compare March 29, 2025 12:52
@bettio
Copy link
Collaborator Author

bettio commented Mar 29, 2025

Noteworthy additional changes:

The binary_to_integer function was not compliant with OTP behavior, it was accepting trailing spaces, and base prefixes (such as 0x).
This implementation has same behavior of OTP one, and also tries to optimize base 10 and base 16 cases.
It also avoids expensive int64_t multiplications on 32 hardware, as much as possible.

It replaces our implementation, with an optimized one.
Having a specialized case for base 10 allows the compiler to replace division by 10 with reciprocal mul.

@bettio bettio force-pushed the biggerint branch 3 times, most recently from ba9e3a3 to fe978bf Compare March 30, 2025 18:36
nlz function is used from `divmnu` function.
Use compiler builtin when available instead of C implementation.

Signed-off-by: Davide Bettio <davide@uninstall.it>
@bettio bettio force-pushed the biggerint branch 6 times, most recently from 559f3fc to 8aa1058 Compare April 24, 2025 23:23
bettio added 17 commits April 29, 2025 13:26
This function is a first round of integration with intn bigint
implementation.

- Allow printing big integers with `erlang:display/1` and in general with
`term_display` functions.
- Allow converting big integers to binaries and lists using
`erlang:integer_to_binary/1` and `erlang:integer_to_list/1`.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Implement a first arithmetic operation that uses `intn_mulmns` in order to
validate the whole approach.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Add first iteration on bigint tests, starting with tests for `erlang:*/2`
and `integer_to_binary/2`.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Just use `intn_parse` function.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Negative boxed integers have 3rd bit set (b1s00).
Also introduce new defines:
- TERM_BOXED_NEGATIVE_INTEGER
- TERM_BOXED_INTEGER_SIGN_BIT
- TERM_BOXED_INTEGER_SIGN_BIT_POS

Signed-off-by: Davide Bettio <davide@uninstall.it>
Add functions for checking if a term is a positive integer, and etc...
Function names are inspired to Erlang typespecs (such as
non_neg_integer).

Signed-off-by: Davide Bettio <davide@uninstall.it>
Start moving existing code to predicates such as
`term_is_any_non_neg_integer(t)`.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Some operations in 2-complement turns to be quite complex, they
require more code, and also more stack space for storing abs value.
Hence using a dedicated sign bit (as Erlang does) turns to be an easier and
pragmatic approach.

This approach makes possible having sign bit outside the numeric
payload, so the supported range is -(2^256 - 1)..+(2^256 - 1).
They might be called int257, but it would be quite confusing.

Sign bit is stored in boxed header, outside of the numeric payload.

Also add a valgrind supression file, in order to ignore a bogus warning
about overlapping memory in memcpy when executing memmove (that allows
overlapping memory).

Signed-off-by: Davide Bettio <davide@uninstall.it>
On 32-bit systems, use `make_maybe_boxed_int64` in `neg_boxed_helper`
since `-(INT32_MAX + 1)` is `INT32_MIN` that fits into a 32-bit boxed
integer.
Before of this change `make_boxed_int64` was used, making a 64-bit boxed
integer for an int32 value.

New `term_compare` implementation will check size and sign metadata before
performing any actual comparison, so all value must be in their "minimal
canonical form".

Signed-off-by: Davide Bettio <davide@uninstall.it>
Refactor term_compare to use metadata such as size and sign before
performing any integer comparison (that might be expensive for big
integers).

Perform digit by digit comparison for big integers only when size and
sign are equal.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Add functions that do not rely on undefined behavior for converting
unsigned to signed negative integers (and viceversa), for checking if conversion
overflows and for conditionally negate.

Start using newly introduced utilities in both intn and externalterm
(an old macro is removed).

Signed-off-by: Davide Bettio <davide@uninstall.it>
Refactor term_conv_to_float in order to use intn_to_double function.
Also make sure in opcodesswitch that term_conv_to_float() returns a
finite value.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Add helper function float_to_integer_helper, that checks the float result
of functions such as floor, round, trunc, etc... instead of the arguments in
advance.

Furthermore a better upper and limit for safe double to int64 conversion
has been found.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Allow functions such as trunc, round, etc... to return a big integer,
when a number above 2^63 or below -2^63 is given.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Test for overflows and for not-yet-overflowed values, when converting
from float to big int.
Also test comparison between floats and big ints.

Signed-off-by: Davide Bettio <davide@uninstall.it>
This function converts n-bytes in either big or little endian format,
signed / unsigned, and converts them into a intn integer.

Signed-off-by: Davide Bettio <davide@uninstall.it>
This function is required in any place a new bigint term needs to be
created.
Also rename it to `term_intn_to_term_size`.

Signed-off-by: Davide Bettio <davide@uninstall.it>
bettio added 4 commits April 29, 2025 14:06
Add support in SMALL_BIG_EXT parsing to big integers (that means integers that
are >= 8 bytes and <= 32 bytes).

Signed-off-by: Davide Bettio <davide@uninstall.it>
Allow literals bigger than 64 bit, such as:
16#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

As a side note, bigger literals than (2^256 - 1) are encoded as external
terms.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Add functions useful for writing a big integer back to a buffer, as a
little/big-endian integer.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Allow calling term_to_binary for serializing big integers.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants