1 billion row challenge in Rust

This repository contains a solution to the 1 billion row challenge, written in rust.

The solution uses the concept of divide-and-conquer, by dividing the input file into chunks and processing them in parallel. The IO is done by creating a mmap from the challenge file. This allows the solution to achieve read speeds of up to 900Mb/s in my Mac air M1 16GB 1TB of SSD storage. This implementation also uses little memory. I completely avoid 'user' heap-allocations (however the standard-library might heap-allocate internally). As the number of unique station names is bounded by 10000, the data structures used to aggregate the data are very lightweight. In my tests the program stabilizes at about 8.5Mb. We also parse the input as bytes to avoid string overhead.

Running

To time execution, we can use the time command (we build before run to avoid timing the compiler):

cargo build && time cargo run -r src/main.rs measurements.txt

The program receives the filename as input. Details on generating the file at: https://github.com/gunnarmorling/1brc

Time outputs the results like this: 80.97s user 13.88s system 545% cpu 17.392 total

With total being the actual elapsed time.

Profiling

To optimize the code while i iterated on it, i used flamegraph from cargo itself:

With root: sudo CARGO_PROFILE_RELEASE_DEBUG=true cargo flamegraph -r -- measurements.txt

Iterating

In order to quickly test different implementations i used a sample file containing a tenth of the original lines. This file can be created from the original with the get_sample.sh script

Notes

I wanted to avoid using any external crates. The only exception was libc which provides cross-platform C bindings so we could call mmap.

Faster hashing

The standard-library's HashMap doesn't implement the fastest hash algorithm for our use case. We can use crates like hashbrown to leverage faster (but DoS vulnerable) hashing algorithms. The branch phcs/hashbrown-version uses that crate and gets a speed-up of 2 seconds in my machine. It reaches read-speeds of 1Gb/s+.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
flamegraph.svg		flamegraph.svg
get_sample.sh		get_sample.sh
head.txt		head.txt
task.md		task.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

1 billion row challenge in Rust

Running

Profiling

Iterating

Notes

Faster hashing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

p-h-c-s/rust-1brc

Folders and files

Latest commit

History

Repository files navigation

1 billion row challenge in Rust

Running

Profiling

Iterating

Notes

Faster hashing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages