PyData prototype testing 

Some thoughts from Jeff on testing frameworks:

- https://github.com/airspeed-velocity/asv: Measures performance of a package over time
- https://github.com/HypothesisWorks/hypothesis: Property-based testing
- http://www.hammerlab.org/2015/09/30/testing-oml/: Issues with numerical precision in testing mathematical codebases

Some common testing issues likely to come up are:

- **Dispatching**: When relying on so many projects with varying levels of support for array duck typing, it will be important to test compatibility of the backends.  In other words, if I expect a dask array at the end of a computation, do I really get one or does something coerce to numpy along the way?
  - Dask, numpy, and sparse are backends we'll likely find use for (possibly CuPy)
  - We have to be especially careful with Xarray since it has no \_\_array_function__ implementation
    - This means that this code coerces inputs to numpy before the multiplication, so we have to make sure to use the Xarray API for any non-ufuncs:
      ```python
        np.dot(
          xr.DataArray(da.array([[1,2,3]])), 
          xr.DataArray(da.array([[1,1], [2,2], [3,3]]))
        )
      ```
- **Validation**: How do we validate our methods against Hail or PLINK?
- **Informative Diffs**: Since a large portion of any GWAS pipeline is dedicated to QC, a very common output from each method is going to be a smaller version of the input (with samples/variants removed).  Diff'ing an Xarray Dataset will be a frequent operation then because we'll want to know exactly how a result differs from expectation when a test fails

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PyData prototype testing #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PyData prototype testing #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions