Skip to content

Releases: RoaringBitmap/roaring-rs

Version 0.10.1

08 Sep 08:15
35c4dc9
Compare
Choose a tag to compare

This release is a patch in which @jonasspinner fixed the MultiOps union for Result iterators which was wrongly using the intersection operations.

Version 0.10 - Introducing faster multiple-bitmaps operations 🧨

06 Sep 11:55
51a5cb0
Compare
Choose a tag to compare

In this new v0.10 release, we introduce significant improvements for multiple bitmap operations and many new methods for range operations. You can find the documentation on docs.rs. The only reason it is a breaking release is due to the simd feature that was updated to match the new types of the std::simd nightly module.

About the new MultiOps trait

@irevoire based this trait on @saik0's previous work. It brings a new fresh API to help in executing different set operations on a set of RoaringBitmap/Treemap. With up to 37x improvements on intersection operations and 7x on unions, it clearly makes this PR worth releasing. If you want more numbers, you can look at some benchmarks in the original PR.

The MultiOps trait is implemented for anything that implements IntoIterator where Item is a RoaringBitmap/Treemap, but there also are try_ versions of it when your Item is a Result. It can be very useful when the sets are lazily deserialized, for example.

use roaring::{MultiOps, RoaringBitmap};

let bitmaps = [
    RoaringBitmap::from_iter(0..10),
    RoaringBitmap::from_iter(10..20),
    RoaringBitmap::from_iter(20..30),
];

// Stop doing this
let naive = bitmaps.clone().into_iter().reduce(|a, b| a | b).unwrap_or_default();

// And start doing this instead. It will be much faster!
let iter = bitmaps.union();

assert_eq!(naive, iter);

Improve most of the intersection operations

@RaduBerinde made a huge improvement to the whole set of intersection operations by implementing a better version of the retain method that is now branchless. Thank you very much! This PR does between 1.03x and 1.37x improvement, sometimes slowing down the performances but for the good of all the other benchmarks.

Introduce more range-related methods

While @Dr-Emann was introducing the new contains_range and range_cardinality methods on the RoaringBitmap type, @not-jan was working on the RoaringTreemap::insert_range method.

use roaring::{RoaringTreemap, RoaringBitmap};

let mut rb = RoaringTreemap::new();
rb.insert_range(2..4);
assert!(rb.contains(2));
assert!(rb.contains(3));
assert!(!rb.contains(4));

let mut rb = RoaringBitmap::new();
rb.insert_range(2..45_000);
assert!(rb.contains_range(2..30));
assert!(!rb.contains_range(0..3));
assert!(!rb.contains_range(6_000..46_001));
assert_eq!(rb.range_cardinality(6_000..45_001), 39_000);

More methods for full sets

While I (@Kerollmops) was reviewing the work on the RoaringBitmap::insert_range I also introduced more methods for full RoaringBitmap and RoaringTreemap. You can now create full bitmaps and treemaps.

use roaring::{RoaringTreemap, RoaringBitmap};

let mut rt = RoaringTreemap::full();
assert!(rt.is_full());
assert!(rt.contains(42));

let mut rb = RoaringBitmap::full();
assert!(rb.is_full());
assert!(!rt.is_empty());

Serde is in the place 🎤

@irevoire didn't only work on the MultiOps trait but also introduced the support of serde! You can use enable it by specifying the serde feature and then convert your RoaringBitmap and RoaringTreemap into any format supported by serde and back into a bitmap.

use roaring::{RoaringTreemap, RoaringBitmap};

let original = RoaringTreemap::from_range(6..42);
let json = serde_json::to_vec(&original).unwrap();
let output = serde_json::from_slice(&json).unwrap();
assert_eq!(original, output);

let original = RoaringBitmap::from_range(16..142);
let buffer = bincode::serialize(&bitmap).unwrap();
let output = bincode::deserialize(&buffer).unwrap();
assert_eq!(bitmap, output);

Introducing the most important speed-up to date 🚀

02 Mar 10:12
fa246d7
Compare
Choose a tag to compare

Hey folks,

Before I start the note on the new release, I'll just introduce what roaring-rs is. It is a Rust idiomatic port of the Roaring Bitmap data structure introduced by Daniel Lemire, G. Ssi-Yan-Kai and O. Kaser. The data structure simply defines better-compressed sets of bits, faster and memory-efficient data structures for performing operations on sets, i.e. intersections, unions... and so on.

So, about this new version of roaring-rs, it is a bit special because it's the version that introduces the most performance improvements in a while and on the most topics. Most of this amazing work was done by @saik0, a new contributor to the crate 🎉. @schmidek implemented DoubleEndedIterator on the bitmap iterators 💪

Breaking changes

  • Removed deprecated set ops such as union_with. Use the corresponding operators |=.
  • Minimum supported Rust version (MSRV) increased to 1.56.
  • deserialize_from validates inputs. In some cases, it can be 4x slower. For workloads that are heavy in deserialization from trusted sources migrate to deserialize_unchecked_from.

Performance optimizations

  • from_sorted_iter and append are exponentially faster. They should be preferred over collect and extend whenever adding monotonically increasing integers to the set as it's about 2-2.5x faster.
  • Other performance optimizations.
    Min Mean Max
    Iteration 6% 57% 125%
    And 0% 7% 22%
    Or 0% 10% 33%
    Sub 0% 9% 39%
    Xor 4% 90% 209%

New features

  • rank Returns the number of integers that are lower or equal to a given value. rank(u64::MAX) == len().
  • select Returns the nth integer in the set or None if n >= len().
  • union_len, intersection_len ... and so on. Compute the cardinality of a set operation without materializing it. Usually about twice as fast as materializing the bitmap.
  • implemented DoubleEndedIterator.
  • implemented ExactSizeIterator (on 64 bit targets only).
  • EXPERIMENTAL SIMD feature (requires rust nightly).
    Min Mean Max
    And 0% 34% 141%
    Or 2% 45% 145%
    Sub 0% 51% 168%
    Xor 0% 130% 437%

Other

  • Added proper testing suite for set operations.
  • Added many benchmarks and real world datasets.

Perf numbers are for pairwise operations on collections of bitmaps from real datasets.
Pairwise as in: B1 ∩ B2, B2 ∩ B3, ... BN-1 ∩ BN

Version 0.8.1

09 Nov 10:38
84aba71
Compare
Choose a tag to compare

Version 0.8.0

21 Oct 13:36
25728f2
Compare
Choose a tag to compare

Version 0.7.0

29 Apr 19:23
9b50928
Compare
Choose a tag to compare

Version 0.6.7

29 Apr 14:17
db758b0
Compare
Choose a tag to compare

Version 0.6.6

21 Apr 09:46
2f41a25
Compare
Choose a tag to compare

Version 0.6.5

20 Feb 20:02
0af2b53
Compare
Choose a tag to compare

Version 0.6.4

23 Jan 10:19
811ca45
Compare
Choose a tag to compare