-
Notifications
You must be signed in to change notification settings - Fork 605
Half-bad hetvars: one bad allele fails the genotype [VS-1615] [VS-1649] #9149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -173,7 +173,7 @@ workflows: | |||
branches: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the changes in this file were just for testing purposes and don't need to be merged.
@@ -17,34 +17,34 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is all stowaway changes fixing some bonkers whitespace that was driving IntelliJ nuts
@@ -178,7 +178,7 @@ workflow GvsQuickstartIntegration { | |||
git_branch_or_tag = git_branch_or_tag, | |||
git_hash = GetToolVersions.git_hash, | |||
use_VETS = false, | |||
extract_do_not_filter_override = true, | |||
extract_do_not_filter_override = false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will revert before merge
@@ -0,0 +1,105 @@ | |||
version 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think this needs to be merged, revert
any_indel_ok=acc.any_indel_ok | (~allele_is_snp[called_idx - 1] & allele_OK[called_idx - 1]), | ||
), hl.struct(any_no=False, any_yes=False, any_snp=False, any_indel=False, any_snp_ok=False, any_indel_ok=False))) | ||
all_snps_ok=acc.all_snps_ok & (~allele_is_snp[called_idx - 1] | allele_OK[called_idx - 1]), | ||
all_indels_ok=acc.all_indels_ok & (allele_is_snp[called_idx - 1] | allele_OK[called_idx - 1]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused as to why this is not:
all_indels_ok=acc.all_indels_ok & (allele_is_snp[called_idx - 1] | allele_OK[called_idx - 1]), | |
all_snps_ok=acc.all_snps_ok & (allele_is_snp[called_idx - 1] & allele_OK[called_idx - 1]), | |
all_indels_ok=acc.all_indels_ok & (~allele_is_snp[called_idx - 1] & allele_OK[called_idx - 1]), |
Maybe I'm just confounded by the logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking the SNP line as an example, I interpreted this as "nothing wrong SNP-wise if it's not a SNP at all, OR (implying it is a SNP) if the allele is OK".
If the allele was bad and it was an INDEL, the following line would catch it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually now that I look at this again I think I can/should clean it up a bit... I don't think we care about "any" SNPs or INDELs any more now that the default for the "ok" accumulators is True
.
VETS VCFs
Look for differences between the expected and actual VCFs for VETS with a command like this:
VETS SNP example: new behavior on the upper line, old behavior on the lower line. Allele 1 has a calibration sensitivity of 0.9977, greater than the 0.997 threshold. With the new behavior, all three samples fail. The first two samples have 0/1 genotypes and the third has 1/2. With the old behavior, the first two samples fail but the third is rescued by the 2 allele's low calibration sensitivity.
VETS INDEL example: Here allele 2 has a calibration sensitivity of 0.993, greater than the INDEL threshold of 0.99. In the upper line (new behavior), sample 3 with genotype 1/2 fails. On the lower line (old behavior), sample 3 is rescued by the relatively low calibration sensitivity of allele 1.
VQSR VCFs
VQSR SNP example: new behavior on the upper line, old behavior on the lower line. The low_VQSLOD_SNP threshold here is -2.6332. Allele 1 has a VQSLOD below the threshold, but in the upper line the 1/2 genotype of the second and third samples PASSes. In the lower line this genotype fails.
VQSR INDEL example: new behavior on the upper line, old behavior on the lower line. The low_VQSLOD_INDEL threshold here is -0.5365. Allele 1 has a VQSLOD below the threshold, but in the upper line the 4/1 genotype of the third samples PASSes. In the lower line this genotype fails.
VDS Creation
This simply uses the new(ish)
GvsTieOutVDS
to tie out new VETS VCFs with a newly created VDS. Link to run here.VDS Merge & Rescore
The changes to the merge & rescore logic are exactly the same as those in the import logic. Here we perform a merge and rescore run with this updated 1/2 logic, and then crack open the output VDS to spot check.
First check the input "echo" VDS and see that a particular hetvar genotype with one bad allele is not filtered (
FT
flag isTrue
despite the failing "2" allele):Now looking at the "output" VDS that has gone through merge and rescore, a genotype with this same failing allele has
FT
ofFalse
: