rapidstats:

What is it?

rapidstats is a minimal library that implements fast statistical routines in Rust and Polars. While similar in spirit, it does not aim to be a complete re-implementation of libraries like scikit-learn or scipy. Only functions that can be significantly faster (e.g. a bootstrap class that offers optimized Rust kernels for metrics such as ROC-AUC) or significantly more ergonomic (e.g. dataframe-first encoders and scalers) are added.

This library is in an alpha state. Although all functions are tested against existing libraries, use at your own risk. The API is subject to change very frequently.

Usage:

Dependencies

rapidstats has a minimal set of dependencies. It only depends on polars, narwhals (for dataframe compatibility), and tqdm (for progress bars). You may install pyarrow (pip install rapidstats[pyarrow]) to allow functions to take numpy arrays, pandas objects, and other objects that may be converted through Arrow.

Installing

The easiest way is to install rapidstats is from PyPI using pip:

pip install rapidstats

Performance

rapidstats is very fast. For example, say you wanted the confusion matrix metrics for a 50,000 row dataset. You aren't sure what exact threshold you want yet, so you decide to compute the metrics for multiple thresholds, let's say 500. With sklearn, this takes 40 seconds. With rapidstats, this takes just .2 seconds, a 198x speedup! Furthermore, rapidstats can use a cumuluative sum algorithm that computes the metrics at all possible thresholds, not just these particular 500. So finding the metrics for 500 or 50,000 metrics takes the exact same amount of time. In addition, even just looping the rapidstats version is a 58x speedup, since rapidstats applies several optimizations, such as computing the basic confusion matrix (TP, FP, FN, TN) using a nice bincount trick and avoiding re-computing this basic matrix for each different metric.

Similarly, calculating the bootstrapped (100 iterations) ROC-AUC of a 25,000 sample dataset takes only .15 seconds, compared to .83 seconds for the equivalent sklearn + scipy operation, a speedup of 5.3x.

Name		Name	Last commit message	Last commit date
Latest commit History 266 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
python/rapidstats		python/rapidstats
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
GEMINI.md		GEMINI.md
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
release.py		release.py
requirements_dev.txt		requirements_dev.txt
requirements_docs.txt		requirements_docs.txt
requirements_test.txt		requirements_test.txt
rust-toolchain.toml		rust-toolchain.toml
tox.ini		tox.ini
update_docs.py		update_docs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

rapidstats:

What is it?

Usage:

Dependencies

Installing

Performance

About

Uh oh!

Releases 26

Packages

Uh oh!

Languages

License

CangyuanLi/rapidstats

Folders and files

Latest commit

History

Repository files navigation

rapidstats:

What is it?

Usage:

Dependencies

Installing

Performance

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 26

Packages 0

Uh oh!

Languages

Packages