Skip to content

Inconsistent ploidy and segmentation fault when creating 1000GP chrX.PAR[1-2].msav #74

@youngchanpark

Description

@youngchanpark

Hi,

I split the 1000GP chrX VCF into nonPAR, PAR1, and PAR2 regions then tried creating .msav files with these but I encountered some issues. Below is the code I used.

#/usr/bin/env bash
wget https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chrX.filtered.SNV_INDEL_SV_phased_panel.v2.vcf.gz
wget https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chrX.filtered.SNV_INDEL_SV_phased_panel.v2.vcf.gz.tbi

# Borders for nonPAR checked manually
# PAR1 - nonPAR border variant
# chrX    2781514
# nonPAR PAR2 border variant
# chrX    155700882
# The above two variants are nonPAR variants. So the nonPAR vcf should start and end with these two variants included.

echo -e "chrX\t2531036\t2531036\tHGSV_248479\nchrX\t2777488\t2777488\tHGSV_249395" > exclude_ids.bed
# these two <DUP> and <DEL> variants in PAR1 sneaks into the nonPAR if we don't specifically exclude
bcftools view -r chrX:2781514-155700882 -T ^exclude_ids.bed 1kGP_high_coverage_Illumina.chrX.filtered.SNV_INDEL_SV_phased_panel.v2.vcf.gz -Oz -o chrX.nonPAR.vcf.gz
tabix 1kGP_high_coverage_Illumina.chrX.nonPAR.filtered.SNV_INDEL_SV_phased_panel.vcf.gz

bcftools view -r chrX:1-2781513 1kGP_high_coverage_Illumina.chrX.filtered.SNV_INDEL_SV_phased_panel.v2.vcf.gz -Oz -o chrX.PAR1.vcf.gz
tabix chrX.PAR1.vcf.gz

bcftools view -r chrX:155700883- 1kGP_high_coverage_Illumina.chrX.filtered.SNV_INDEL_SV_phased_panel.v2.vcf.gz -Oz -o chrX.PAR2.vcf.gz
tabix chrX.PAR2.vcf.gz

# chrX nonPAR (OK)
minimac4 --compress-reference chrX.nonPAR.vcf.gz > chrX.nonPAR.msav

# chrX PAR1 (Inconsistent ploidy)
minimac4 --compress-reference chrX.PAR1.vcf.gz > chrX.PAR1.msav

# chrX PAR2 (Segfault)
minimac4 --compress-reference chrX.PAR2.vcf.gz > chrX.PAR2.msav

The outputs of minimac4 were:

minimac v4.1.6

Error: sample ploidy is not consistent
Error: compressing variant failed
minimac v4.1.6

Mean Compression Ratio: 0.0399954
Min Compression Ratio: 0.0240718
Max Compression Ratio: 0.0915209
minimac v4.1.6

./create_xchr_msavs.sh: line 32: 3385400 Segmentation fault      (core dumped) 

msav for the nonPAR region was created as expected.

All samples in the PAR1 and PAR2 vcfs are either haploid or diploid for all variants. I assumed this would qualify as consistent ploidy, but PAR1 failed with inconsistent ploidy message. Is my understanding wrong? Does all samples within the VCF need to be either haploid or diploid?

I'm also getting a Segmentation fault for PAR2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions