Skip to content

biodatageeks/polars-bio

Repository files navigation

polars-bio - Next-gen Python DataFrame operations for genomics!

PyPI - Version GitHub License PyPI - Downloads GitHub commit activity

CI Docs logo

polars-bio is a Python library for genomics built on top of polars, Apache Arrow and Apache DataFusion. It provides a DataFrame API for genomics data and is designed to be blazing fast, memory efficient and easy to use.

Key Features

Single-thread performance πŸƒβ€

overlap-single.png

overlap-single.png

count-overlaps-single.png

coverage-single.png

Parallel performance πŸƒβ€πŸƒβ€

overlap-parallel.png

overlap-parallel.png

count-overlaps-parallel.png

coverage-parallel.png

Citing

If you use polars-bio in your work, please cite:

@article {Wiewiorka2025.03.21.644629,
	author = {Wiewiorka, Marek and Khamutou, Pavel and Zbysinski, Marek and Gambin, Tomasz},
	title = {polars-bio - fast, scalable and out-of-core operations on large genomic interval datasets},
	elocation-id = {2025.03.21.644629},
	year = {2025},
	doi = {10.1101/2025.03.21.644629},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2025/03/25/2025.03.21.644629},
	eprint = {https://www.biorxiv.org/content/early/2025/03/25/2025.03.21.644629.full.pdf},
	journal = {bioRxiv}
}

Read the documentation