MOKA implements a Snakemake pipeline to automate data bridge kernel-based association tests. This pipeline offers flexibility of GWAS analysis & visualizations with different multi-omics variant specific weights. Publication available at: https://www.medrxiv.org/content/10.1101/2025.07.06.25330974v1
To run the moka pipeline:
1.Minimal data inputs
-GWAS genotype files in PLINK format (bed, bim & fam)
-Variant specific weights for each SNP ('SNP_ID, CHROMOSOME, POSITION, WEIGHT)
- Install Snakemake
- Snakemake Installation Guide
conda install -n base -c conda-forge mamba mamba create -c conda-forge -c bioconda -n snakemake mamba activate snakemake snakemake --help
- Snakemake Installation Guide
- Install Plink, Python & R (Rscript configure) !important Check Dependencies section
- Download and install moka
git clone https://github.com/davidenoma/moka
cd moka
- Configure the pipeline parameters in the
config.yaml
file. - Execute the pipeline using the software:
- Input: Preprocessed genotype data and weight files.
- Output: Results of association tests.
snakemake --cores <num_cores>
If you do not have all the dependencies with Python and R you can get it configured on conda, utilize with:
snakemake --cores <num_cores> --use-conda
However, some R packages are not available to best to be installed R package manager.
- Input: Individual association test results.
- Output: Merged association test results.
snakemake --cores 1 merge_moka_results
- Input: Merged association test results.
- Output: Annotated association test results with DisGeNet database
snakemake --cores 1 disgenet_annotation_005
- Input: Merged association test results.
- Output: Manhattan plots with visual representations of association test results.
snakemake --cores 1 manhattan_plots
- Input: Merged association test results.
- Output: GO analysis results.
snakemake --cores 1 go_analysis
- Input: Merged association test results.
- Output: KEGG pathway analysis results.
snakemake --cores 1 kegg_pathway_analysis
- Input: Genotype
- Output: results for association mapping, folder: output_association/
snakemake --cores 22 skat
They must be configured on your path
- Snakemake (8.0.1+)
- R(4.2.0+)
- Python (3.9+)
- PLINK (1.9+): [https://www.cog-genomics.org/plink/1.9/]
- Rscript
- FaST-LMM Factored Spectrally Transformed Linear Mixed Models, is a program for performing genome-wide association studies (GWAS) on datasets of all sizes
- PySnpTools PySnpTools is a library for reading and manipulating genetic data.
pip install pysnptools fastlmm
- manhattan: R package for creating manhattan plots, commonly used in genome-wide association studies (GWAS).
- SKAT: R package for SKAT (Sequence Kernel Association Test) which is a powerful gene-based association test.
- QQMAN: R package for creating QQ (Quantile-Quantile) plots, commonly used in GWAS to assess whether observed p-values deviate from the expected distribution under the null hypothesis.
- GGPLOT: R package for creating highly customizable plots and graphics.
- gprofiler2: R package for gene set enrichment analysis (GO analysis).
- pathfindR: R package for pathway analysis, including KEGG pathway analysis.
Installation steps:
install.packages(c("BiocManager","SKAT","ggplot2"))
BiocManager::install(c( "gprofiler2", "pathfindR","manhattan","qqman"))
- Parallel: Linux Parallel GNU : https://www.gnu.org/software/parallel/
apt install parallel #linux or WSL windows
brew install parallel #macos
- Data Files: Plink https://www.cog-genomics.org/plink/1.9/ format genotyped BIM, BED & FAM files [!required]
- Multi-omics Bridge weights.csv file (SNP_ID,Chromosome,Position,Weight) [!required for moka]
- Gene regions file provied in GRCh38 or hg38. (Genome Research Consortium Human Build 38)
- DisGeNET gene disease database reference file ( If disease external validation needed)
You much lift over to GRCh38 format check here: Liftover GWAS: [https://github.com/davidenoma/LiftOver]
- genotype_prefix: Prefix for genotype data files.
- weights_type: Text string for type of bridge weights to be used e.g. "eqtl", "imaging"
- genotype_file_path: Path to genotype data files.
- weight_file: Path to weight files used for association tests.
- disgenet_reference_file: External disease database specific gene-disease associations from https://disgenet.org [For gene disease associations only!]
- spectral decomposition: Flag for performation decomposition and transformation of genotype and phenotype, default: TRUE
- is_binary: Flag for binary/ quantitative trait, default: TRUE
- Plink: Path to plink installation e.g. "~/software/plink"
For more information on the MOKA pipeline and its usage, refer to the documentation provided in the repository or contact the project maintainers. david.enoma@ucalgary.ca
MOKA: A pipeline for multi-omics bridged SNP-set kernel association test https://www.medrxiv.org/content/10.1101/2025.07.06.25330974v1