This project evaluates the mmap-sync
Rust library under a simulated production workload with 1 writer thread and 12 reader threads, measuring latency metrics for shared memory operations.
Writer:
Instant::now()
captures timestamps before and aftersynchronizer.write()
.- Measures exclusive lock acquisition + data serialization + mmap flush.
Readers:
- Records time from detecting a version change (via
synchronizer.version()
) to completingsynchronizer.read()
. - Each reader thread stores measurements locally to avoid cross-thread contention.
- Batched Writes: Latencies stored in thread-local
Vec
buffers, flushed to CSV post-test. - No Formatting During Test: Avoid
println!
or string operations during measurements. - CPU Pinning: Writer/readers pinned to separate CPU cores via
core_affinity
to reduce OS scheduler noise.
Writer (Market Data):
- Poisson Process: Exponential inter-arrival times (
λ = 10,000/sec
) to mimic bursty financial data feeds. - Pareto Distribution: Bursty writes with long-tail latency spikes (simulates market opening events).
Readers:
- Continuous Polling: Readers check
version()
in a tight loop, simulating low-latency trading systems that prioritize freshness over CPU efficiency.
- Writer: Constant maximum rate (no delays between writes) to saturate the system.
- Readers: Unthrottled polling to create contention.
- Why?: This worst-case scenario tests:
- Lock fairness between writer and readers
- Memory bandwidth limits
- OS scheduler behavior under load
- Baseline Expectation:
- Writer: 20–50 µs (aligns with Cloudflare's 10–30 µs + overhead)
- Readers: 50–150 µs (12 readers creating contention)
- Key Variables:
- NUMA node locality between writer/readers
- CPU cache thrashing from frequent writes
- Mmap flush granularity (page vs. byte-level)
- Distribution mode (poisson/stress/pareto)
// Zero-copy deserialization with rkyv
#[derive(Archive, Deserialize, Serialize, Debug)]
#[archive_attr(derive(CheckBytes))]
pub struct BidAsk {
pub side: [u8; 4], // "buy" or "sell"
pub exchange: [u8; 10], // Up to 10 chars
pub symbol: [u8; 8], // "BTC-USD"
pub price: f64,
pub size: f64,
pub timestamp: f64,
}
#[derive(Archive, Deserialize, Serialize, Debug)]
#[archive_attr(derive(CheckBytes))]
pub struct BestBidAsk {
pub best_bid: BidAsk,
pub best_offer: BidAsk,
}
-
Writer Thread
- Generates random
BestBidAsk
every 100 µs (avg) - Uses
mmap-sync
'sSingleWriter
for exclusive access - Records time to write + flush
- Generates random
-
Reader Threads
- Continuously poll
version()
- On version change, read + deserialize data
- Record time from detection to completed read
- Continuously poll
-
Post-Processing
- Merge per-thread CSV files
- Compute statistics (mean, p95, etc.) using HDR histograms
- Generate latency-over-time plot
cargo r -r -- --readers 6 --mode poisson
Spawning 6 reader threads
Collected 58365 writer samples
Collected 350000 reader samples
=== Writer Latency (µs) ===
Count: 58365
Min: 14.2 us
Max: 5664.8 us
Mean: 33.1 us
Median: 29.9 us
p95: 53.9 us
p99: 109.6 us
=== Reader Latency (µs) ===
Count: 350000
Min: 0.0 us
Max: 14876.7 us
Mean: 0.7 us
Median: 0.2 us
p95: 0.4 us
p99: 3.8 us
Plot saved to latency_plot.png
AWS EC2 c6in.8xlarge Instance:
- CPU: Intel Xeon 8375C (Ice Lake) - 16 physical cores/32 threads
- Memory: 64GB DDR4 with tmpfs mount
- OS: AWS Linux 2 (Kernel 5.10)
- Test Configuration: 10-second duration, 12 reader threads
Metric | Writer | Reader |
---|---|---|
Count | 99,994 | 1,199,112 |
Min | 0.4 µs | 0.1 µs |
Max | 183.6 µs | 36.8 µs |
Mean | 0.5 µs | 0.7 µs |
Median | 0.4 µs | 0.5 µs |
p95 | 0.6 µs | 1.9 µs |
p99 | 1.3 µs | 2.3 µs |
Throughput: 9,999 writes/sec | 119,911 reads/sec
Metric | Writer | Reader |
---|---|---|
Count | 99,989 | 1,199,798 |
Min | 0.4 µs | 0.1 µs |
Max | 298.0 µs | 42.3 µs |
Mean | 0.4 µs | 0.7 µs |
Median | 0.4 µs | 0.5 µs |
p95 | 0.5 µs | 1.9 µs |
p99 | 0.7 µs | 2.2 µs |
Throughput: 9,999 writes/sec | 119,979 reads/sec
Metric | Writer | Reader |
---|---|---|
Count | 541,524 | 6,225,731 |
Min | 0.4 µs | 0.0 µs |
Max | 257.0 µs | 37.6 µs |
Mean | 18.2 µs | 0.9 µs |
Median | 1.8 µs | 0.6 µs |
p95 | 58.4 µs | 2.2 µs |
p99 | 59.0 µs | 3.0 µs |
Throughput: 54,152 writes/sec | 622,573 reads/sec
-
Writer Performance Characteristics
-
Normal Operation (Poisson/Pareto):
- Consistent sub-0.5µs median latency across all realistic scenarios
- Tight p95-p99 spread (0.2-0.9µs) demonstrates predictable behavior
- Maximum latencies under 300µs even during burst scenarios
-
Stress Mode:
- Maintained 1.8µs median despite 54k writes/sec throughput
- p99 latency stable at 59µs showing effective contention management
- 257µs max latency demonstrates bounded worst-case behavior
-
-
Reader Consistency
- Sub-microsecond median latency across all test scenarios
- p99 below 3µs even under maximum contention
- Maximum observed latency under 43µs across all modes
-
Infrastructure Impact
- tmpfs: Reduced writer max latency by 3.9x (1443µs vs 5664µs on Mac)
Scenario | Healthy Range | Warning Threshold |
---|---|---|
Writer Median Latency | <2µs | ≥2µs |
Writer p99 Latency | <60µs | ≥60µs |
Reader p95 Latency | <3µs | ≥5µs |
Write Throughput | <55k/s | >60k/s |
Read Throughput | <650k/s | >700k/s |
*Calculated from 10s test totals: Stress Mode = 502,256 writes / 10s = 50k writes/sec, 5.76M reads / 10s = 576k reads/sec*
The mmap-sync library demonstrates:
- Predictable low latency: 0.4-0.7µs median for both roles in normal operation
- Graceful degradation: Writer p99 grows only 2.5x from 1.3µs (Poisson) to 59µs (Stress)
- Horizontal scalability: 12 readers add <1µs to median read latency
- Production readiness: Sustains 54k writes/sec with sub-60µs p99 latency
# Clone and build
git clone git@github.com:hanchang/tom_tang.git
cd tom_tang
cargo build --release
# Run with 12 readers for 60s (Poisson mode)
cargo r -r -- --readers 12 --duration 60
# Stress test mode (max writes)
cargo r -r --readers 12 --duration 60 --mode stress
We maintain three dedicated branches to store benchmark outputs (CSV, plots, etc.):
# pareto
pareto_latencies.zip
pareto_latency_plot.png_combined.png
pareto_latency_plot.png_reader.png
pareto_latency_plot.png_writer.png
pareto_output.txt
# poisson
poisson_latencies.zip
poisson_latency_plot.png_combined.png
poisson_latency_plot.png_reader.png
poisson_latency_plot.png_writer.png
poisson_output.txt
# stress
stress_latencies.zip
stress_latency_plot.png_combined.png
stress_latency_plot.png_reader.png
stress_latency_plot.png_writer.png
stress_output.txt
These branches are only for historical data and reference, so main development remains clean. If you need to review past test artifacts or reproduce a particular scenario, check out these branches.