-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Description
Hi,
I split the 1000GP chrX VCF into nonPAR, PAR1, and PAR2 regions then tried creating .msav files with these but I encountered some issues. Below is the code I used.
#/usr/bin/env bash
wget https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chrX.filtered.SNV_INDEL_SV_phased_panel.v2.vcf.gz
wget https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chrX.filtered.SNV_INDEL_SV_phased_panel.v2.vcf.gz.tbi
# Borders for nonPAR checked manually
# PAR1 - nonPAR border variant
# chrX 2781514
# nonPAR PAR2 border variant
# chrX 155700882
# The above two variants are nonPAR variants. So the nonPAR vcf should start and end with these two variants included.
echo -e "chrX\t2531036\t2531036\tHGSV_248479\nchrX\t2777488\t2777488\tHGSV_249395" > exclude_ids.bed
# these two <DUP> and <DEL> variants in PAR1 sneaks into the nonPAR if we don't specifically exclude
bcftools view -r chrX:2781514-155700882 -T ^exclude_ids.bed 1kGP_high_coverage_Illumina.chrX.filtered.SNV_INDEL_SV_phased_panel.v2.vcf.gz -Oz -o chrX.nonPAR.vcf.gz
tabix 1kGP_high_coverage_Illumina.chrX.nonPAR.filtered.SNV_INDEL_SV_phased_panel.vcf.gz
bcftools view -r chrX:1-2781513 1kGP_high_coverage_Illumina.chrX.filtered.SNV_INDEL_SV_phased_panel.v2.vcf.gz -Oz -o chrX.PAR1.vcf.gz
tabix chrX.PAR1.vcf.gz
bcftools view -r chrX:155700883- 1kGP_high_coverage_Illumina.chrX.filtered.SNV_INDEL_SV_phased_panel.v2.vcf.gz -Oz -o chrX.PAR2.vcf.gz
tabix chrX.PAR2.vcf.gz
# chrX nonPAR (OK)
minimac4 --compress-reference chrX.nonPAR.vcf.gz > chrX.nonPAR.msav
# chrX PAR1 (Inconsistent ploidy)
minimac4 --compress-reference chrX.PAR1.vcf.gz > chrX.PAR1.msav
# chrX PAR2 (Segfault)
minimac4 --compress-reference chrX.PAR2.vcf.gz > chrX.PAR2.msav
The outputs of minimac4
were:
minimac v4.1.6 Error: sample ploidy is not consistent Error: compressing variant failed minimac v4.1.6 Mean Compression Ratio: 0.0399954 Min Compression Ratio: 0.0240718 Max Compression Ratio: 0.0915209 minimac v4.1.6 ./create_xchr_msavs.sh: line 32: 3385400 Segmentation fault (core dumped)
msav for the nonPAR region was created as expected.
All samples in the PAR1 and PAR2 vcfs are either haploid or diploid for all variants. I assumed this would qualify as consistent ploidy, but PAR1 failed with inconsistent ploidy message. Is my understanding wrong? Does all samples within the VCF need to be either haploid or diploid?
I'm also getting a Segmentation fault
for PAR2.
Metadata
Metadata
Assignees
Labels
No labels