Skip to content

PyData prototype testing  #21

@eric-czech

Description

@eric-czech

Some thoughts from Jeff on testing frameworks:

Some common testing issues likely to come up are:

  • Dispatching: When relying on so many projects with varying levels of support for array duck typing, it will be important to test compatibility of the backends. In other words, if I expect a dask array at the end of a computation, do I really get one or does something coerce to numpy along the way?
    • Dask, numpy, and sparse are backends we'll likely find use for (possibly CuPy)
    • We have to be especially careful with Xarray since it has no __array_function__ implementation
      • This means that this code coerces inputs to numpy before the multiplication, so we have to make sure to use the Xarray API for any non-ufuncs:
          np.dot(
            xr.DataArray(da.array([[1,2,3]])), 
            xr.DataArray(da.array([[1,1], [2,2], [3,3]]))
          )
  • Validation: How do we validate our methods against Hail or PLINK?
  • Informative Diffs: Since a large portion of any GWAS pipeline is dedicated to QC, a very common output from each method is going to be a smaller version of the input (with samples/variants removed). Diff'ing an Xarray Dataset will be a frequent operation then because we'll want to know exactly how a result differs from expectation when a test fails

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions