Skip to content

Raise a validation error if there are SNV_INDEL symbolic alleles #884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Sep 3, 2024

Conversation

bpblanken
Copy link
Collaborator

…DEL symbolic alleles

@property
def invalid_allele_types(self) -> hl.SetExpression:
return {
DatasetType.SV: hl.set([hl.genetics.allele_type.AlleleType.UNKNOWN]),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change in behavior here is to allow SYMBOLIC alleles for SV only and not for the other types.

@@ -103,8 +103,15 @@ def update_table(self, mt: hl.MatrixTable) -> hl.MatrixTable:
),
)

# Rather than throwing an error, we silently remove NON_REF symbolic
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a product question that maybe @lynnpais and @matren395 can help with!

we received a request for loading a gVCF which passed all of our validation, except it also contains rows at sites with no variants. Per the gvcf docs, The records in a GVCF include an accurate estimation of how confident we are in the determination that the sites are homozygous-reference or not.

I know we have discussed this in the past few weeks so it would be good to nail down exactly what we want to do with an actual live file! We can either:

  • raise a validation exception and instruct the collaborator to finish the reblocking process.
  • filter the NON_REF alleles prior to validation.

Regardless, I think it makes sense for SNV_INDEL to not accept symbolic alleles so we should change our validation process to include those allele types as invalid.

@bpblanken bpblanken marked this pull request as ready for review September 3, 2024 14:39
@bpblanken bpblanken requested a review from a team as a code owner September 3, 2024 14:39
@bpblanken bpblanken changed the title Filter NON_REF and raise a validation error if there are other SNV_IN… Raise a validation error if there are SNV_INDEL symbolic alleles Sep 3, 2024
) -> None:
ht = mt.rows()
ht = ht.filter(
hl.numeric_allele_type(ht.alleles[0], ht.alleles[1])
== hl.genetics.allele_type.AlleleType.UNKNOWN,
dataset_type.invalid_allele_types.contains(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this to be more precise in the error message... using a collected set rather than just the first 10 alleles.

@bpblanken bpblanken merged commit fcf165b into dev Sep 3, 2024
3 checks passed
@bpblanken bpblanken deleted the benb/filter_non_ref branch September 3, 2024 16:10
bpblanken added a commit that referenced this pull request Sep 3, 2024
Raise a validation error if there are SNV_INDEL symbolic alleles (#884)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants