Introducing the most important speed-up to date π
Hey folks,
Before I start the note on the new release, I'll just introduce what roaring-rs is. It is a Rust idiomatic port of the Roaring Bitmap data structure introduced by Daniel Lemire, G. Ssi-Yan-Kai and O. Kaser. The data structure simply defines better-compressed sets of bits, faster and memory-efficient data structures for performing operations on sets, i.e. intersections, unions... and so on.
So, about this new version of roaring-rs, it is a bit special because it's the version that introduces the most performance improvements in a while and on the most topics. Most of this amazing work was done by @saik0, a new contributor to the crate π. @schmidek implemented DoubleEndedIterator
on the bitmap iterators πͺ
Breaking changes
- Removed deprecated set ops such as
union_with
. Use the corresponding operators|=
. - Minimum supported Rust version (MSRV) increased to 1.56.
deserialize_from
validates inputs. In some cases, it can be 4x slower. For workloads that are heavy in deserialization from trusted sources migrate todeserialize_unchecked_from
.
Performance optimizations
from_sorted_iter
andappend
are exponentially faster. They should be preferred overcollect
andextend
whenever adding monotonically increasing integers to the set as it's about 2-2.5x faster.- Other performance optimizations.
Min Mean Max Iteration 6% 57% 125% And 0% 7% 22% Or 0% 10% 33% Sub 0% 9% 39% Xor 4% 90% 209%
New features
rank
Returns the number of integers that are lower or equal to a given value.rank(u64::MAX) == len()
.select
Returns then
th integer in the set orNone
ifn >= len()
.union_len
,intersection_len
... and so on. Compute the cardinality of a set operation without materializing it. Usually about twice as fast as materializing the bitmap.- implemented
DoubleEndedIterator
. - implemented
ExactSizeIterator
(on 64 bit targets only). - EXPERIMENTAL SIMD feature (requires rust nightly).
Min Mean Max And 0% 34% 141% Or 2% 45% 145% Sub 0% 51% 168% Xor 0% 130% 437%
Other
- Added proper testing suite for set operations.
- Added many benchmarks and real world datasets.
Perf numbers are for pairwise operations on collections of bitmaps from real datasets.
Pairwise as in: B1 β© B2, B2 β© B3, ... BN-1 β© BN