Skip to content

Feature Request: Add Support for Parallel Processing #365

@ranggakd

Description

@ranggakd

I've recently begun using the hyppo library for multivariate hypothesis testing and I am appreciating the comprehensiveness and ease-of-use it provides.

As datasets continue to grow in size and complexity, I believe a feature that could greatly benefit this library would be the integration of parallel processing support. This could significantly reduce the time it takes to run tests on larger, high-dimensional datasets, making the library even more efficient and user-friendly.

Here are a few things that could be done:

  1. Parallel computation of test statistics: This could involve using multiprocessing or joblib to compute test statistics in parallel, which could significantly speed up computations for large datasets.

  2. Distributed computing support: For extremely large datasets, it could be beneficial to support distributed computing frameworks like Dask or Apache Spark. This would allow users to leverage the power of a cluster to compute test statistics, which could be particularly useful for Big Data applications.

  3. Asynchronous computation: For certain applications, it might be useful to support asynchronous computation. This would allow users to start a test, do other work while the test is running, and then come back to get the results once the test is done.

I understand that this is a big ask, but I believe these features would greatly enhance the usefulness and performance of hyppo. I'm also willing to contribute to the development of these features if that's something you'd be interested in.

Thank you for considering this feature request.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions