-
Notifications
You must be signed in to change notification settings - Fork 20
Raise a validation error if there are SNV_INDEL symbolic alleles #884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@property | ||
def invalid_allele_types(self) -> hl.SetExpression: | ||
return { | ||
DatasetType.SV: hl.set([hl.genetics.allele_type.AlleleType.UNKNOWN]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change in behavior here is to allow SYMBOLIC alleles for SV
only and not for the other types.
@@ -103,8 +103,15 @@ def update_table(self, mt: hl.MatrixTable) -> hl.MatrixTable: | |||
), | |||
) | |||
|
|||
# Rather than throwing an error, we silently remove NON_REF symbolic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a product question that maybe @lynnpais and @matren395 can help with!
we received a request for loading a gVCF which passed all of our validation, except it also contains rows at sites with no variants. Per the gvcf docs, The records in a GVCF include an accurate estimation of how confident we are in the determination that the sites are homozygous-reference or not
.
I know we have discussed this in the past few weeks so it would be good to nail down exactly what we want to do with an actual live file! We can either:
- raise a validation exception and instruct the collaborator to finish the reblocking process.
- filter the
NON_REF
alleles prior to validation.
Regardless, I think it makes sense for SNV_INDEL to not accept symbolic alleles so we should change our validation process to include those allele types as invalid.
) -> None: | ||
ht = mt.rows() | ||
ht = ht.filter( | ||
hl.numeric_allele_type(ht.alleles[0], ht.alleles[1]) | ||
== hl.genetics.allele_type.AlleleType.UNKNOWN, | ||
dataset_type.invalid_allele_types.contains( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed this to be more precise in the error message... using a collected set rather than just the first 10 alleles.
Raise a validation error if there are SNV_INDEL symbolic alleles (#884)
…DEL symbolic alleles