Skip to content

Introducing the most important speed-up to date πŸš€

Compare
Choose a tag to compare
@Kerollmops Kerollmops released this 02 Mar 10:12
fa246d7

Hey folks,

Before I start the note on the new release, I'll just introduce what roaring-rs is. It is a Rust idiomatic port of the Roaring Bitmap data structure introduced by Daniel Lemire, G. Ssi-Yan-Kai and O. Kaser. The data structure simply defines better-compressed sets of bits, faster and memory-efficient data structures for performing operations on sets, i.e. intersections, unions... and so on.

So, about this new version of roaring-rs, it is a bit special because it's the version that introduces the most performance improvements in a while and on the most topics. Most of this amazing work was done by @saik0, a new contributor to the crate πŸŽ‰. @schmidek implemented DoubleEndedIterator on the bitmap iterators πŸ’ͺ

Breaking changes

  • Removed deprecated set ops such as union_with. Use the corresponding operators |=.
  • Minimum supported Rust version (MSRV) increased to 1.56.
  • deserialize_from validates inputs. In some cases, it can be 4x slower. For workloads that are heavy in deserialization from trusted sources migrate to deserialize_unchecked_from.

Performance optimizations

  • from_sorted_iter and append are exponentially faster. They should be preferred over collect and extend whenever adding monotonically increasing integers to the set as it's about 2-2.5x faster.
  • Other performance optimizations.
    Min Mean Max
    Iteration 6% 57% 125%
    And 0% 7% 22%
    Or 0% 10% 33%
    Sub 0% 9% 39%
    Xor 4% 90% 209%

New features

  • rank Returns the number of integers that are lower or equal to a given value. rank(u64::MAX) == len().
  • select Returns the nth integer in the set or None if n >= len().
  • union_len, intersection_len ... and so on. Compute the cardinality of a set operation without materializing it. Usually about twice as fast as materializing the bitmap.
  • implemented DoubleEndedIterator.
  • implemented ExactSizeIterator (on 64 bit targets only).
  • EXPERIMENTAL SIMD feature (requires rust nightly).
    Min Mean Max
    And 0% 34% 141%
    Or 2% 45% 145%
    Sub 0% 51% 168%
    Xor 0% 130% 437%

Other

  • Added proper testing suite for set operations.
  • Added many benchmarks and real world datasets.

Perf numbers are for pairwise operations on collections of bitmaps from real datasets.
Pairwise as in: B1 ∩ B2, B2 ∩ B3, ... BN-1 ∩ BN