PyData prototype genetics method implementations 

This issue tracks progress on method implementations with a focus on those mentioned in https://github.com/related-sciences/gwas-analysis/issues/16 (or things needed by these methods).

Progress:

- [X] axis_intervals 
  - This identifies variants within BP windows 
  - Very similar to [hail.locus_windows](https://hail.is/docs/0.2/linalg/utils/index.html#hail.linalg.utils.locus_windows)
- [X] maximal_independent_set
  - Somewhat similar to [hail.maximal_independent_set](https://hail.is/docs/0.2/methods/misc.html?highlight=maximal_independent_set#hail.methods.maximal_independent_set) but with chromosome partitioned sequential algorithm for compatibility with PLINK/skallel results
    - My rationale for doing this instead is that users would be less skeptical if results were identical to other tools, rather than rolling a more scalable but less credible heuristic for this from the start
- [X] ld_matrix
  - Very similar to [hail.ld_matrix](https://hail.is/docs/0.2/methods/genetics.html?highlight=ld_matrix#hail.methods.ld_matrix)
  - There are CPU and GPU implementations for this now
- [X] ld_prune (https://github.com/related-sciences/gwas-analysis/issues/26)
- [ ] GRM/RRM
  - GRM
    - Center variants (in rows) by subtracting nanmean
    - Divide by binomial variance for variant under HWE (Patterson 2006)
    - Compute XX^t (for X as n_samples by n_variants)
  - RRM 
    - Same as GRM except that empirical variance is used rather than binomial variance under HWE
    - This IS pearson correlation up to a constant factor
  - This [gist](https://gist.github.com/eric-czech/8b6e0331f7e512f89cf009839e9f84ca) has GRM calc, which uses same scaling as default preprocessor to PCA in scikit-allel
- [x] HWE (https://github.com/pystatgen/sgkit/pull/76)
  - For axis reductions in variant/sample QC as well as PCA normalization
- [x] PC-Relate (https://github.com/related-sciences/gwas-analysis/issues/35)
- [x] PCA (https://github.com/pystatgen/sgkit/pull/262)
- [ ] LMM and LFM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PyData prototype genetics method implementations #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PyData prototype genetics method implementations #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions