High-performance HyperLogLog with bias correction and full concurrency support. Used for accurate and space-efficient cardinality estimation.
HyperLogLogs are space efficient data structures for the "count-distinct problem", approximating the number of distinct elements in a multiset. Paper.
hyperloglockless offers a lockless concurrent HyperLogLog and a single threaded counterpart. They're simpler, faster, and more accurate than other HyperLogLog implementations:
- 🧵 Concurrent:
AtomicHyperLogLog
is a drop-in replacement forRwLock<OtherHyperLogLog>
: all methods take&self
, so you can wrap it inArc
and update it concurrently without&mut
. - ⚡ Fast: Designed to be fast and simple in both single and multi-threaded scenarios.
- 🎯 Accurate: Empirically verified accuracy for trillions of elements; other implementations break down after millions.
- ✅ Tested: Rigorously tested with loom and benchmarked.
[dependencies]
hyperloglockless = "0.3.0"
A HyperLogLog with precision p
uses 2^p
bytes of memory and has an error % of roughly 104 / sqrt(2^p)
.
use hyperloglockless::HyperLogLog;
let precision = hyperloglockless::precision_for_error(0.01); // 1% error
assert_eq!(precision, 14);
let mut hll = HyperLogLog::new(precision);
hll.insert(&'🦀');
hll.insert_all('a'..='z');
let count = hll.count(); // ~27
assert_eq!(hll.len(), 1 << precision); // 16384 bytes
Full concurrency support: AtomicHyperLogLog
is a drop-in replacement for RwLock<OtherHyperLogLog>
: all methods take &self
.
use hyperloglockless::AtomicHyperLogLog;
let hll = AtomicHyperLogLog::new(14);
hll.insert(&'🦀');
hll.insert_all('a'..='z');
Both hyperloglockless::HyperLogLog
and hyperloglockless::AtomicHyperLogLog
are extremely fast for both insert and count calls.
hyperloglockless::AtomicHyperLogLog
does not require any locking and therefore avoids thread contention.
hyperloglockless stays consistently accurate while other implementations break down after millions of items.
rand
- Enabled by default, this has theDefaultHasher
source its random state usingthread_rng()
instead of hardware sources. Getting entropy from a user-space source is considerably faster, but requires additional dependencies to achieve this. Disabling this feature by usingdefault-features = false
makesDefaultHasher
source its entropy usinggetrandom
, which will have a much simpler code footprint at the expense of speed.serde
- HyperLogLogs implementSerialize
andDeserialize
when possible.loom
-AtomicHyperLogLog
s use loom atomics, making it compatible with loom testing.
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.