Skip to content

MOKA Snakemake pipeline for GWAS incorporating user provided multi-omics data sources such as gene expression, neural network weights, transcription factor binding, and evolutionary conservation scores.

License

Notifications You must be signed in to change notification settings

davidenoma/moka

Repository files navigation

Snakemake CI Platforms License

image

🌉 Multi-omics bridged Kernel Association test (MOKA) Pipeline

MOKA implements a Snakemake pipeline to automate data bridge kernel-based association tests. This pipeline offers flexibility of GWAS analysis & visualizations with different multi-omics variant specific weights. Publication available at: https://www.medrxiv.org/content/10.1101/2025.07.06.25330974v1

🚀 Usage

To run the moka pipeline:

1.Minimal data inputs

-GWAS genotype files in PLINK format (bed, bim & fam)

-Variant specific weights for each SNP ('SNP_ID, CHROMOSOME, POSITION, WEIGHT)

  1. Install Snakemake
    • Snakemake Installation Guide
      conda install -n base -c conda-forge mamba
      mamba create -c conda-forge -c bioconda -n snakemake
      mamba activate snakemake
      snakemake --help
  2. Install Plink, Python & R (Rscript configure) !important Check Dependencies section
  3. Download and install moka
git clone https://github.com/davidenoma/moka
cd moka
  1. Configure the pipeline parameters in the config.yaml file.
  2. Execute the pipeline using the software:

📚 Rules

Rule: moka association_test

  • Input: Preprocessed genotype data and weight files.
  • Output: Results of association tests.
snakemake --cores <num_cores>

If you do not have all the dependencies with Python and R you can get it configured on conda, utilize with:

snakemake --cores <num_cores> --use-conda

However, some R packages are not available to best to be installed R package manager.

Rule: merge_results

  • Input: Individual association test results.
  • Output: Merged association test results.
snakemake --cores 1 merge_moka_results

Rule: annotate_results

  • Input: Merged association test results.
  • Output: Annotated association test results with DisGeNet database
snakemake --cores 1 disgenet_annotation_005

Rule: visualize_results

  • Input: Merged association test results.
  • Output: Manhattan plots with visual representations of association test results.
snakemake --cores 1 manhattan_plots

Rule: go_analysis

  • Input: Merged association test results.
  • Output: GO analysis results.
snakemake --cores 1 go_analysis

Rule: kegg_pathway_analysis

  • Input: Merged association test results.
  • Output: KEGG pathway analysis results.
snakemake --cores 1 kegg_pathway_analysis

Rule: Skat test with linear kernel

  • Input: Genotype
  • Output: results for association mapping, folder: output_association/
snakemake --cores 22 skat

Dependencies

Software

They must be configured on your path

Python Packages

  • FaST-LMM Factored Spectrally Transformed Linear Mixed Models, is a program for performing genome-wide association studies (GWAS) on datasets of all sizes
  • PySnpTools PySnpTools is a library for reading and manipulating genetic data.
pip install pysnptools fastlmm

R Packages

  • manhattan: R package for creating manhattan plots, commonly used in genome-wide association studies (GWAS).
  • SKAT: R package for SKAT (Sequence Kernel Association Test) which is a powerful gene-based association test.
  • QQMAN: R package for creating QQ (Quantile-Quantile) plots, commonly used in GWAS to assess whether observed p-values deviate from the expected distribution under the null hypothesis.
  • GGPLOT: R package for creating highly customizable plots and graphics.
  • gprofiler2: R package for gene set enrichment analysis (GO analysis).
  • pathfindR: R package for pathway analysis, including KEGG pathway analysis.

Installation steps:

install.packages(c("BiocManager","SKAT","ggplot2"))
BiocManager::install(c( "gprofiler2", "pathfindR","manhattan","qqman"))

Other Software

apt install parallel #linux or WSL windows
brew install parallel #macos

Input file format

  • Data Files: Plink https://www.cog-genomics.org/plink/1.9/ format genotyped BIM, BED & FAM files [!required]
  • Multi-omics Bridge weights.csv file (SNP_ID,Chromosome,Position,Weight) [!required for moka]
  • Gene regions file provied in GRCh38 or hg38. (Genome Research Consortium Human Build 38)
  • DisGeNET gene disease database reference file ( If disease external validation needed)

Liftover protocol

You much lift over to GRCh38 format check here: Liftover GWAS: [https://github.com/davidenoma/LiftOver]

📋 Configuration

  • genotype_prefix: Prefix for genotype data files.
  • weights_type: Text string for type of bridge weights to be used e.g. "eqtl", "imaging"
  • genotype_file_path: Path to genotype data files.
  • weight_file: Path to weight files used for association tests.
  • disgenet_reference_file: External disease database specific gene-disease associations from https://disgenet.org [For gene disease associations only!]
  • spectral decomposition: Flag for performation decomposition and transformation of genotype and phenotype, default: TRUE
  • is_binary: Flag for binary/ quantitative trait, default: TRUE
  • Plink: Path to plink installation e.g. "~/software/plink"

📖 Additional Information

For more information on the MOKA pipeline and its usage, refer to the documentation provided in the repository or contact the project maintainers. david.enoma@ucalgary.ca

Publication reference

MOKA: A pipeline for multi-omics bridged SNP-set kernel association test https://www.medrxiv.org/content/10.1101/2025.07.06.25330974v1

About

MOKA Snakemake pipeline for GWAS incorporating user provided multi-omics data sources such as gene expression, neural network weights, transcription factor binding, and evolutionary conservation scores.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published