-
Notifications
You must be signed in to change notification settings - Fork 90
variant lookup table has sample_type #4289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hail_search/queries/mito.py
Outdated
# Variant can be present in the lookup table with only ref calls, so is still not present in any projects | ||
if not variant_projects: | ||
raise HTTPNotFound() | ||
|
||
new_variant_projects = defaultdict(lambda: defaultdict(dict)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assuming you need to kepe this. post processing, new_variant_projects
is a confusing variable name - maybe something like variant_project_sample_types
hl.dict(family_indices.map(lambda j: (lookup_ht.project_families[project_guid][j], True))), | ||
).starmap(lambda project_key, family_indices: ( | ||
project_key, | ||
hl.dict(family_indices.map(lambda j: (lookup_ht.project_families[project_key][j], True))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure on the syntax here but I think you can update this to return the structure you want by changing this to the following:
hl.enumerate(lookup_ht.project_stats).starmap(lambda i, ps: (
lookup_ht.project_guids[i],
hl.enumerate(ps).starmap(
lambda j, s: hl.or_missing(self._stat_has_non_ref(s), j)
).filter(hl.is_defined),
)).filter(
lambda x: x[1].any(hl.is_defined)
.starmap(lambda project_key, family_indices: (
project_key,
hl.dict(family_indices.map(lambda j: (lookup_ht.project_families[project_key][j], True))),
).group_by(
lambda x: x[0][0]
).map_values(
lambda project_data: hl.dict(project_data.starmap(
lambda project_key, families: (project_key[1], families)
))
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns something like {'R0001_1kg': {'WES': {'F000002_2': True}}, 'R0003_test': {'WES': {'F000011_11': True}, 'WGS': {'F000011_11': True}}}
but the resulting structure is expected to be {project_guid: {family_guid: {sample_type: bool}}}. I've been trying to get the result that I want in this block of hail code but haven't figured it out yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I think the change would be
hl.enumerate(lookup_ht.project_stats).starmap(lambda i, ps: (
lookup_ht.project_guids[i],
hl.enumerate(ps).starmap(
lambda j, s: hl.or_missing(self._stat_has_non_ref(s), j)
).filter(hl.is_defined),
)).filter(
lambda x: x[1].any(hl.is_defined)
.flatmap(
lambda x: x[1].map(lambda j: (x[0][0], x[0][1], lookup_ht.project_families[project_key][j]))
).group_by(
lambda x: x[0]
).map_values(
lambda project_data: project_data.group_by(
lambda x: x[2]
).map_values(
lambda x: {x[1]: True}
)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched the order of the project_samples
dict and used your original suggestion here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! We should come up with a release plan to do a simultaneous deployment of this code with the updated lookup tables, and in the meantime we should hold off on merging this to dev so we can continue to release other features
migration PR: broadinstitute/seqr-loading-pipelines#867