Skip to content

tomtomwombat/hyperloglockless

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hyperloglockless

Crates.io docs.rs License: MIT License: APACHE Downloads

High-performance HyperLogLog with bias correction and full concurrency support. Used for accurate and space-efficient cardinality estimation.

Overview

HyperLogLogs are space efficient data structures for the "count-distinct problem", approximating the number of distinct elements in a multiset. Paper.

hyperloglockless offers a lockless concurrent HyperLogLog and a single threaded counterpart. They're simpler, faster, and more accurate than other HyperLogLog implementations:

  • 🧵 Concurrent: AtomicHyperLogLog is a drop-in replacement for RwLock<OtherHyperLogLog>: all methods take &self, so you can wrap it in Arc and update it concurrently without &mut.
  • Fast: Designed to be fast and simple in both single and multi-threaded scenarios.
  • 🎯 Accurate: Empirically verified accuracy for trillions of elements; other implementations break down after millions.
  • Tested: Rigorously tested with loom and benchmarked.

Usage

[dependencies]
hyperloglockless = "0.3.0"

A HyperLogLog with precision p uses 2^p bytes of memory and has an error % of roughly 104 / sqrt(2^p).

use hyperloglockless::HyperLogLog;

let precision = hyperloglockless::precision_for_error(0.01); // 1% error
assert_eq!(precision, 14);

let mut hll = HyperLogLog::new(precision);
hll.insert(&'🦀');
hll.insert_all('a'..='z');

let count = hll.count(); // ~27
assert_eq!(hll.len(), 1 << precision); // 16384 bytes

Full concurrency support: AtomicHyperLogLog is a drop-in replacement for RwLock<OtherHyperLogLog>: all methods take &self.

use hyperloglockless::AtomicHyperLogLog;

let hll = AtomicHyperLogLog::new(14);
hll.insert(&'🦀');
hll.insert_all('a'..='z');

Single-Threaded Performance vs Others

Both hyperloglockless::HyperLogLog and hyperloglockless::AtomicHyperLogLog are extremely fast for both insert and count calls.

fp-micro fp-micro

Multi-Threaded Performance vs Others

hyperloglockless::AtomicHyperLogLog does not require any locking and therefore avoids thread contention.

fp-micro

Accuracy vs Others

hyperloglockless stays consistently accurate while other implementations break down after millions of items.

fp-micro

Available Features

  • rand - Enabled by default, this has the DefaultHasher source its random state using thread_rng() instead of hardware sources. Getting entropy from a user-space source is considerably faster, but requires additional dependencies to achieve this. Disabling this feature by using default-features = false makes DefaultHasher source its entropy using getrandom, which will have a much simpler code footprint at the expense of speed.
  • serde - HyperLogLogs implement Serialize and Deserialize when possible.
  • loom - AtomicHyperLogLogs use loom atomics, making it compatible with loom testing.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

About

Lightning fast concurrent HyperLogLog for Rust.

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages