-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Labels
Description
This issue tracks progress on method implementations with a focus on those mentioned in #16 (or things needed by these methods).
Progress:
- axis_intervals
- This identifies variants within BP windows
- Very similar to hail.locus_windows
- maximal_independent_set
- Somewhat similar to hail.maximal_independent_set but with chromosome partitioned sequential algorithm for compatibility with PLINK/skallel results
- My rationale for doing this instead is that users would be less skeptical if results were identical to other tools, rather than rolling a more scalable but less credible heuristic for this from the start
- Somewhat similar to hail.maximal_independent_set but with chromosome partitioned sequential algorithm for compatibility with PLINK/skallel results
- ld_matrix
- Very similar to hail.ld_matrix
- There are CPU and GPU implementations for this now
- ld_prune (PyData prototype LD prune implementation #26)
- GRM/RRM
- GRM
- Center variants (in rows) by subtracting nanmean
- Divide by binomial variance for variant under HWE (Patterson 2006)
- Compute XX^t (for X as n_samples by n_variants)
- RRM
- Same as GRM except that empirical variance is used rather than binomial variance under HWE
- This IS pearson correlation up to a constant factor
- This gist has GRM calc, which uses same scaling as default preprocessor to PCA in scikit-allel
- GRM
- HWE (https://github.com/pystatgen/sgkit/pull/76)
- For axis reductions in variant/sample QC as well as PCA normalization
- PC-Relate (PyData PC-Relate Integration #35)
- PCA (https://github.com/pystatgen/sgkit/pull/262)
- LMM and LFM