-
Notifications
You must be signed in to change notification settings - Fork 61
Description
I would like to perform cis-pQTL analysis on 3,000 proteins using Regenie.
My genotype data has already been prefiltered using PLINK under the following criteria: MAF > 0.01, MAC > 100, HWE p-value > 1e-8, and LD pruning (prune in 1000 SNP windows, 100 SNPs at a time, r² threshold of 0.8). Despite this filtering, there are still 1712571 variants across the whole genome available for analysis, and my sample size is around 20,000 individuals.
Are there ways to reduce the number of SNPs without compromising analysis quality?For instance, could I analyze only SNPs located on the same chromosome as each protein-coding gene? Or would it be reasonable to restrict the analysis to HapMap3 SNPs to reduce dimensionality? Are there other filtering strategies that could help lower the SNP count?
Any advice on optimizing the workflow would be greatly appreciated.