Skip to content

Some issues during the step1 process #644

@JhGoodLuck

Description

@JhGoodLuck

I would like to perform cis-pQTL analysis on 3,000 proteins using Regenie.
My genotype data has already been prefiltered using PLINK under the following criteria: MAF > 0.01, MAC > 100, HWE p-value > 1e-8, and LD pruning (prune in 1000 SNP windows, 100 SNPs at a time, r² threshold of 0.8). Despite this filtering, there are still 1712571 variants across the whole genome available for analysis, and my sample size is around 20,000 individuals.
Are there ways to reduce the number of SNPs without compromising analysis quality?For instance, could I analyze only SNPs located on the same chromosome as each protein-coding gene? Or would it be reasonable to restrict the analysis to HapMap3 SNPs to reduce dimensionality? Are there other filtering strategies that could help lower the SNP count?
Any advice on optimizing the workflow would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions