Skip to content

PyData prototype simulation methods #31

@eric-czech

Description

@eric-czech

We should start thinking about how to simulate data as part of a public API. PLINK and Hail support this and I think we should think about it now because it will be an important part of improving unit testing. I was chatting with @ravwojdyla and we're both at about the same place in our testing -- we have more simplistic test cases now but would both benefit from synthetic data representing a single dimension of genetic structure, likely with some tunable level of complexity. Essentially we need a better version of Hypothesis and while we're at it, why not make it part of the API?

Some examples:

  • LD estimation/pruning: A useful simulator would generate a provided number of variants with LD that is either 0, 1, or some specific value in between.
  • Kinship estimation: Simulating near-perfect recombination within a provided pedigree would make our tests more realistic, provided that the kinship coefficients fall into cleanly separable modes
  • PCA: Balding-nichols would make this easy, and any PCA test could input high Fst values to get easily separated populations
  • Association Testing: Something like Hail's experimental ldscsim make_betas and simulate_phenotypes would be useful for validation LMM and multi-trait models we work on

It may be that most users don't care about simulators that aren't representative of comprehensive genetic structure (e.g. hapgen), but I think being explicit about our simulations would improve understanding of the methods and that this should be something we coordinate on regardless, rather than making private versions for test cases on a per-method basis. This would also make it much easier to demonstrate what a method does without always having to appeal real datasets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions