-
Notifications
You must be signed in to change notification settings - Fork 25
Description
I'm attempting to replicate the Michigan Imputation Server locally. Since we will be focusing on a few loci in small sample sizes, seems best to just run Eagle and Minimac directly instead of through Cloudgene. I downloaded the latest Minimac4 release (4.1.6) but wanted to sanity check that it produced the same results as the version being reported by the Imputation Server (4-1.0.2).
We're using GRCh38 and I could not find any existing M3VCF/MSAV files so created from the 1000 Genomes release using Minimac3 to create the M3VCF and Minimac4.1.6 to create the MSAV (--compress-reference
). I found some discordance in the genotypes imputed from the two Minimac4 versions and, out of curiosity, created a new MSAV from the M3VCF via --update-m3vcf
in Minimac4.1.6. Using this M3VCF-sourced MSAV, the imputation results between 4.1.6 and 4-1.0.2 are much more similar. I used savvy to export the 2 MSAV files to VCF and found the difference seems to be in the block structure:
$ zgrep $'\t<BLOCK' ALL.chr1.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.noSingltons.MSAV-from-M3VCF.vcf.gz | cut -f 1-10 | head
1 10416 1:10416 CCCTAA <BLOCK> . . END=62157;VARIANTS=27;REPS=178 UHM 106
1 62157 1:62157 G <BLOCK> . . END=80454;VARIANTS=34;REPS=200 UHM 12
1 80454 1:80454 G <BLOCK> . . END=84139;VARIANTS=25;REPS=180 UHM 0
1 84139 1:84139 A <BLOCK> . . END=89744;VARIANTS=44;REPS=165 UHM 0
1 89744 1:89744 A <BLOCK> . . END=494542;VARIANTS=22;REPS=138 UHM 22
1 494542 1:494542 G <BLOCK> . . END=600536;VARIANTS=26;REPS=125 UHM 0
1 600536 1:600536 A <BLOCK> . . END=638772;VARIANTS=37;REPS=145 UHM 0
1 638772 1:638772 A <BLOCK> . . END=771717;VARIANTS=17;REPS=92 UHM 0
1 771717 1:771717 C <BLOCK> . . END=785674;VARIANTS=58;REPS=103 UHM 11
1 785674 1:785674 T <BLOCK> . . END=790100;VARIANTS=51;REPS=122 UHM 6
$ zgrep $'\t<BLOCK' ALL.chr1.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.noSingltons.MSAVvia--compress-reference.vcf.gz | cut -f 1-10 | head
1 10416 . CCCTAA <BLOCK> . . END=66264;VARIANTS=30;REPS=206 UHM 0,5
1 66269 . A <BLOCK> . . END=80454;VARIANTS=30;REPS=183 UHM 0,4
1 82163 . G <BLOCK> . . END=86028;VARIANTS=30;REPS=274 UHM 0,131
1 86065 . G <BLOCK> . . END=101550;VARIANTS=50;REPS=204 UHM 0,0
1 120983 . C <BLOCK> . . END=597799;VARIANTS=30;REPS=239 UHM 0,1
1 597818 . C <BLOCK> . . END=666203;VARIANTS=40;REPS=203 UHM 0,0
1 668135 . C <BLOCK> . . END=785203;VARIANTS=70;REPS=243 UHM 0,6
1 785417 . G <BLOCK> . . END=790082;VARIANTS=50;REPS=124 UHM 0,109
1 790099 . A <BLOCK> . . END=798782;VARIANTS=50;REPS=115 UHM 0,23
1 798941 . G <BLOCK> . . END=809630;VARIANTS=50;REPS=118 UHM 0,5
The M3VCF-sourced MSAV has adjacent blocks (the END INFO field is the POS value of the next block) but the MSAV created directly from the genotype VCF does not. And since the blocks are not adjacent, there is no common overlapping variant:
zgrep -C 1 $'\t<BLOCK' ALL.chr1.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.noSingltons.MSAVvia--compress-reference.vcf.gz | cut -f 1-10 | tail -n +5 | head -n 11
1 66264 . A T . . AC=29;AN=5096;UHA=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1
1 66269 . A <BLOCK> . . END=80454;VARIANTS=30;REPS=183 UHM 0,4
1 66269 . A T . . AC=25;AN=5096;UHA=0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
--
1 80454 . G C . . AC=6;AN=5096;UHA=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1
1 82163 . G <BLOCK> . . END=86028;VARIANTS=30;REPS=274 UHM 0,131
1 82163 . G A . . AC=170;AN=5096;UHA=0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
--
1 86028 . T C . . AC=195;AN=5096;UHA=0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1
1 86065 . G <BLOCK> . . END=101550;VARIANTS=50;REPS=204 UHM 0,0
1 86065 . G C . . AC=196;AN=5096;UHA=0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
Perhaps the MSAV format does not require the same block adjacency as the M3VCF format does? But the differing imputation results seem to indicate this differing format has an effect. Is this expected?