This project benchmarks probabilistic data structures like HyperLogLog (HLL) and Bloom Filters against brute-force methods for:
- Unique IP Counting in a video streaming platform
- Blocked IP Detection for cybersecurity (DDoS attack mitigation)
- Brute-force method (PostgreSQL) → Exact results (but high memory usage)
- HyperLogLog (Redis) → Approximate unique IP counting (low memory usage)
- Bloom Filter (Redis) → Fast blocked IP detection (probabilistic false positives)
- Performance comparison → Speed, accuracy, memory usage
git clone https://github.com/siddharthaasal/benchmarking-probabilistic-data-structures.git
cd benchmarking-probabilistic-data-structures
npm install
docker run -p 6379:6379 -it --rm redis/redis-stack-server:latest
# Start PostgreSQL
sudo service postgresql start
# Login to PostgreSQL
sudo -i -u postgres
psql
mkdir data
node scripts/generateIpData.js --count 5000 --path ./data/data.json
CREATE DATABASE benchmarking;
\c benchmarking;
CREATE TABLE ip_logs (
id SERIAL PRIMARY KEY,
ip VARCHAR(45) NOT NULL,
timestamp TIMESTAMPTZ DEFAULT NOW()
);
node scripts/loadIpData.js
node scripts/seedHll.js
node src/benchmarks/uniqueIpBenchmark.js
Test API Endpoints:
# Brute-force unique IP count
curl http://localhost:3000/bruteForce/unique-ip-count
# HLL estimated unique IP count
curl http://localhost:3000/hll/count
node scripts/generateIpData.js --count 5000 --path ./data/blocked_ips.json
CREATE DATABASE benchmarking;
\c benchmarking;
CREATE TABLE blocked_ips (
id SERIAL PRIMARY KEY,
ip VARCHAR(45) UNIQUE NOT NULL
);
node scripts/loadBlockedIpData.js
node scripts/seedBf.js
node src/benchmarks/blockedIpBenchmark.js
Test API Endpoints:
# Brute-force blocked IP check
curl http://localhost:3000/bruteForceBlockedIp/is-blocked/10.90.68.223
# Bloom Filter blocked IP check
curl http://localhost:3000/bloom/is-blocked/10.90.68.222
The benchmarking results include:
- Response time comparison (Brute-force vs. Probabilistic methods)
- Throughput (requests/sec)
- Memory usage (PostgreSQL vs. Redis)
- Accuracy (false positives & false negatives in Bloom Filter)
MIT License