Skip to content

PyData prototype genetics method implementations  #30

@eric-czech

Description

@eric-czech

This issue tracks progress on method implementations with a focus on those mentioned in #16 (or things needed by these methods).

Progress:

  • axis_intervals
  • maximal_independent_set
    • Somewhat similar to hail.maximal_independent_set but with chromosome partitioned sequential algorithm for compatibility with PLINK/skallel results
      • My rationale for doing this instead is that users would be less skeptical if results were identical to other tools, rather than rolling a more scalable but less credible heuristic for this from the start
  • ld_matrix
    • Very similar to hail.ld_matrix
    • There are CPU and GPU implementations for this now
  • ld_prune (PyData prototype LD prune implementation #26)
  • GRM/RRM
    • GRM
      • Center variants (in rows) by subtracting nanmean
      • Divide by binomial variance for variant under HWE (Patterson 2006)
      • Compute XX^t (for X as n_samples by n_variants)
    • RRM
      • Same as GRM except that empirical variance is used rather than binomial variance under HWE
      • This IS pearson correlation up to a constant factor
    • This gist has GRM calc, which uses same scaling as default preprocessor to PCA in scikit-allel
  • HWE (https://github.com/pystatgen/sgkit/pull/76)
    • For axis reductions in variant/sample QC as well as PCA normalization
  • PC-Relate (PyData PC-Relate Integration #35)
  • PCA (https://github.com/pystatgen/sgkit/pull/262)
  • LMM and LFM

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions