polars-bio - Next-gen Python DataFrame operations for genomics!

polars-bio is a Python library for genomics built on top of polars, Apache Arrow and Apache DataFusion. It provides a DataFrame API for genomics data and is designed to be blazing fast, memory efficient and easy to use.

Key Features

optimized for peformance and memory efficiency for large-scale genomics datasets analyses both when reading input data and performing operations
popular genomics operations with a DataFrame API (both Pandas and polars)
SQL-powered bioinformatic data querying or manipulation
native parallel engine powered by Apache DataFusion and sequila-native
out-of-core/streaming processing (for data too large to fit into a computer's main memory) with Apache DataFusion and polars
support for federated and streamed reading data from cloud storages (e.g. S3, GCS) with Apache OpenDAL enabling processing large-scale genomics data without materializing in memory
zero-copy data exchange with Apache Arrow
bioinformatics file formats with noodles and exon
fast overlap operations with COITrees: Cache Oblivious Interval Trees
pre-built wheel packages for Linux, Windows and MacOS (arm64 and x86_64) available on PyPI

Single-thread performance 🏃‍

Parallel performance 🏃‍🏃‍

Citing

If you use polars-bio in your work, please cite:

@article {Wiewiorka2025.03.21.644629,
	author = {Wiewiorka, Marek and Khamutou, Pavel and Zbysinski, Marek and Gambin, Tomasz},
	title = {polars-bio - fast, scalable and out-of-core operations on large genomic interval datasets},
	elocation-id = {2025.03.21.644629},
	year = {2025},
	doi = {10.1101/2025.03.21.644629},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2025/03/25/2025.03.21.644629},
	eprint = {https://www.biorxiv.org/content/early/2025/03/25/2025.03.21.644629.full.pdf},
	journal = {bioRxiv}
}

Read the documentation

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.github/workflows		.github/workflows
benchmark		benchmark
docs		docs
polars_bio		polars_bio
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

polars-bio - Next-gen Python DataFrame operations for genomics!

Key Features

Single-thread performance 🏃‍

Parallel performance 🏃‍🏃‍

Citing

About

Releases 31

Contributors 2

Languages

License

biodatageeks/polars-bio

Folders and files

Latest commit

History

Repository files navigation

polars-bio - Next-gen Python DataFrame operations for genomics!

Key Features

Single-thread performance 🏃‍

Parallel performance 🏃‍🏃‍

Citing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 31

Contributors 2

Languages