This pipeline is a reproducible workflow for calculating and validating PRS for Type 2 Diabetes (T2D) across the UK Biobank, All of Us, and Lyday datasets. This pipeline leverages Python, R, and Nextflow to enable robust PRS analysis with ancestry adjustment and statistical validation.
Folder | Description |
---|---|
Data/ |
Describes each cohort: Lyday, All of Us, and UK Biobank. Includes info on preprocessing. |
Tools/ |
Contains usage notes for core tools like pgsc_calc , PRSice-2 , PRScs , Beagle , and more. |
Ancestry Adjustment/ |
Focuses on ancestry-normalized PRS and includes the logic replicated from the pgsc_calc framework. |
Validation/ |
Scripts and results for statistical validation using R. |
PRS_Pipeline/ |
(Coming soon) Python-based CLI pipeline to modularly run each step with checkpointing. |
- UK Biobank: Used for testing and benchmarking PRS tools (data already imputed and standardized)
- All of Us: PRS calculation in addition to ancestry-adjustment using
pgsc_calc
- Lyday: Raw VCFs requiring genotype harmonization, imputation, and conversion prior to PRS calculation
Each dataset has its own documentation inside the Data/
directory.
pgsc_calc
: Nextflow-based official PRS scoring pipelinePRSice-2
: Clumping + thresholding toolPRScs
: Bayesian regression method for PRSBeagle 5.5
: Imputationconform-gt
: Genotype harmonization- Python for pipeline scripting
- R for statistical validation (e.g., AUC, Odds Ratio, Cohen’s d)
This project is under development. To reproduce or test:
- Navigate to the dataset of interest under
Data/
- Set up the tools required under
Tools/
- Run the PRS pipeline using
pgsc_calc
or the custom Python workflow - Validate the scores using the scripts in
Validation/
- Lambert et al. (2021), The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation.
- All of Us Research Program, NIH
Valli Sree Lasya Pasumarthy (vpasumarthy3@gatech.edu)
Graduate Research – Jordan Lab
Masters of Science in Bioinformatics
School of Biological Sciences, Georgia Institute of Technology