You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Known Issues
- `half8` `==` and `!=` operators don't conform to the IEEE 754 standard - Unity has not yet reacted to my bug-report in regards to their "half" implementation
- `(s)byte`, `(u)short` vector and `(U)Int128` multiplication, division and modulo operations by compile time constants are not optimal. For (U)Int128, it requires a new Burst feature à la `T Constant.ForceCompileTimeEvaluation<T, U>(Func<T, U> code)`(proposed); Currently work is being done on `(s)byte` and `(u)short` vectors in this regard, which will beat any compiler. The current (tested)state of all optimizations possible is currently included.
- `pow` functions with compile time constant exponents currently do not handle many decimal numbers - `math.rsqrt` would often be used in those cases for optimal performance but it is actually slower when the `Unity.Burst.FloatMode` is set to anything but `FloatMode.Fast`. To guarantee optimal performance, compile time access to the current `FloatMode` would be needed (proposed)
- double `(r)cbrt` and thus possibly (u)int `intcbrt` functions are currently not optimized
### Fixes
- linked `float8` `rcp` and `rsqrt` functions to Bursts' `FloatMode` and `FloatPrecision`
- `short.MinValue / -1` now correctly overflows to `short.MinValue` when dividing a `short16` vector by another `short16` vector when compiling for AVX or higher
- fixed scalar `quarter` to `double` conversion for when the `quarter` value is negative
- fixed scalar `half` to `quarter` conversion for when the `half` value is negative
- fixed vector `quarter` to `ulong` conversion for when a `quarter` value is negative
- fixed `(u)short8` to `quarter8` conversion
### Additions
# Added saturation arithmetic to the library for all scalar- and vector types. Saturation arithmetic clamps the result of an operation to `type.MinValue` and `type.MaxValue` if under- or overflow occurs, respectively and has single-instruction hardware support for `(s)bytes` and `(u)shorts`. The included functions are:
- `addsaturated`
- `subsaturated`
- `mulsaturated`
- `divsaturated` (only clamps division of floating point types and signed division of, for instance, `sbyte.MinValue` ( = -128) `/ -1 to 127`, which would cause a hardware exception for `int`s and `longs`)
- `castsaturated` (all types to all other types with a smaller range),
- `csumsaturated`
- `cprodsaturated`
- added high performance `(U)Int128` types with full library support, meaning: all operators and type conversions aswell as all functions support these types. Most operations of both types, in Burst code, compile down to optimal machine code. Exceptions: 1) signed 64x64 bit to 128 bit multiplication 2) `*`, `/`, `%` and `divrem` functions with a scalar compile time constant argument (See: Known Issues #2)
- added `Random128` XOR-Shift pseudo random number generator for generating `(U)Int128`s
- added high performance & accuracy `(r)cbrt` - (reciprocal) cube root functions for scalar and vector `float`- and `double` types based on a research paper from 2021. An optional `bool` parameter allows the caller to decide whether or not negative input values should be handled correctly (which is not the case with `math.pow(x, 1f/3f)`), which is set to `false` by default
- added high performance `intcbrt` - integer cube root functions for all scalar and vector integer types. For signed integer types, an optional `bool` parameter allows the caller to decide whether or not negative input values should be handled correctly (which is not the case with `math.pow(x, 1f/3f)`), which is set to `false` by default
- added a `log` function to all scalar and vector `float`- and `double` types with a second parameter `b`, which is the logarithms' base
- added `reversebytes` functions for all scalar- and vector types, which convert back and forth between big endian and little endian byte order, respectively. All of them (scalar, vector) compile down to single hardware instructions
- added `pow` functions with scalar exponents for `float` and `double` scalars and vectors, with optimizations for selected constant exponents (not necessarily whole exponents)
- added function overloads to all functions for scalar `(s)byte`s and `(u)short`s in order to resolve function call resolution ambiguity which was already present in `Unity.Mathematics`, which may also improve performance in some cases
- added a static readonly `New` property to `RandomX` XOR-Shift pseudo random generators. It calls `Environment.TickCount` internally (and is thus seeded somewhat randomly), makes sure it is non-zero and can be called from Burst native code
- added `fastrcp` functions for `float` scalars and vectors, faster (and substantially less accurate) than `FloatPrecision.Low`, `FloatMode.Fast` Burst implementations
- added `fastrsqrt` functions for `float` scalars and vectors, faster (and substantially less accurate) than `FloatPrecision.Low`, `FloatMode.Fast` Burst implementations
### Improvements
- added AVX and AVX2 code for `float8` `sin`, `cos`, `tan`, `sincos`, `asin`, `acos`, `atan`, `atan2`, `sinh`, `cosh`, `tanh`, `pow`, `exp`, `exp2`, `exp10`, `log`, `log2`, `log10` and `fmod` (and the "%" operator)
- optimized many `/`, `%`, `*` and `divrem` operations with a scalar compile time constant argument for (s)byte vectors (see 'Known Issues #2'), which were previously not optimized (...optimally/at all) by Burst.
- added SSE2 fallback code for converting AVX vector types to SSE vector types and vice versa(for example: `short16`(256 bit) to `byte16`(128 bit))
- scalar `(s)byte` and `(u)short` `rol` and `ror` functions now compile down to single hardware instructions
- improved performance and/or reduced code size of nearly all vector comparison operations
- improved performance of - and added SSE2 fallback code for bitfield to boolean vector conversion (`toboolX` and thus also `select(vector a, vector b, bitmask c)`);
- improved performance of `intpow` functions in general and for when the exponent is a compile time constant
- improved performance and reduced code size of `compareto` vector functions (especially for unsigned types)
- added more optimizations to `isdivisible`
- improved performance of `intsqrt` functions for `(u)long` and `(s)byte` scalar and vector types considerably
- reduced code size of `ispow2` vector functions
- reduced code size of (s)byte vector-by-vector division
- improved performance of `Random64`'s `(u)long4` generation if compiling for AVX2
- improved performance of `(s)byte` matrix multiplication
- reduced code size of `(u)short`- and up to `(s)byte8` vector by vector division and `divrem` functions(and improved performance if compiling for SSE2)
- reduced code size and improved performance of `isinrange` functions for `(u)long` vector types
- reduced code size of ushort vector `>=` and `<=` operators for SSE2 fallback code by ~75%
- improved performance and reduced code size of SSE2 down-casting fallback code
### Changes
- API BREAKING CHANGE: The various bool to integer/floating point conversion functions (`touint8`/`tof32` etc.) are now renamed to contain C# types in their names (`tobyte`/`tofloat` etc.)
- API BREAKING CHANGE: If you use this library as intended, meaning you import it and `Unity.Mathematics.math` statically (`using static MaxMath.maxmath;`) and you use the `pow` functions with scalar bases and scalar exponents in those scripts, you will encounter the first ever function call resolution ambiguity. It is strongly recommended to always use the `maxmath.pow` function, because it optimizes any `pow` call enormously if the exponent is a compile time constant, which does NOT necessarily mean that such a call must declare the exponent as a literal value - the exponent may become a compile time constant due to constant propagation
- `quarter` is now a readonly struct
- `quarter` to `sbyte`, `short`, `int` and `long` coversions are now required to be declared explicitly
- removed `countbits(void* ptr, ulong bytes)` from the library and added it to https://github.com/MrUnbelievable92/SIMD-Algorithms with more options
### Fixed Oversights
- (Issue #3) added constructor wrappers to the maxmath class analogous to `Unity.Mathematics`(`byte4 myByte4 = (maxmath.)byte4(1, 2, 3, 4);`)
- added `dsub` - fused divide-subtract function for scalar and vector `float` types
- added an optional `bool fast = false` parameter to `dad`, `dsub`, `dadsub` and `dsubadd` functions
- added `andnot` function overloads for scalar and vector `bool` types
- added implicit type conversions of scalar `quarter` values to `half`, `float` and `double` vectors
- added `all_eq` and `all_dif` functions for vectors of size 2
- added `all_eq` and `all_dif` functions for `float` and `double` vectors
0 commit comments