GitHub - jcsherin/datablok: Novel and high-performance applications of database building blocks (Apache DataFusion, Arrow & Parquet)

Datablok is a collection of experiments in novel and high-performance applications of the Rust database building blocks (Apache DataFusion, Arrow & Parquet).

Highlights

parquet-nested-parallel - Writing 1 billion nested records to Parquet with a per-core throughput of ~1.3 million records per second, using a multi-stage parallel pipeline.
tantivy-byte-array-index - Embedding arbitrary data in Parquet and exploiting it to improve DataFusion query performance. In this instance we embed a Tantivy full-text index to accelerate LIKE queries.

Project Goals

Test the performance limits of single-node data processing.
Explore novel ways of composing database building blocks.
Find and contribute improvements to the underlying libraries.

Usage

To run a specific experiment, use the -p or --package flag for cargo from the root of the repository.

For example, to run the hello-datafusion doodle:

cargo run -p hello-datafusion

Local Development

The verify.sh script mirrors the CI pipeline. Running this script is a good practice before pushing code changes to prevent failures in CI. Catching errors locally is much faster than waiting for the CI pipeline to discover them.

# Run all checks on the 'hello-datafusion' package
./scripts/verify.sh hello-datafusion

# For a more detailed output, use the --verbose flag
./scripts/verify.sh --verbose hello-datafusion

Name		Name	Last commit message	Last commit date
Latest commit History 172 Commits
.cargo		.cargo
.github/workflows		.github/workflows
crates		crates
scripts		scripts
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Highlights

Project Goals

Usage

Local Development

About

Uh oh!

Releases

Packages

Languages

License

jcsherin/datablok

Folders and files

Latest commit

History

Repository files navigation

Highlights

Project Goals

Usage

Local Development

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages