Skip to content

add hail sample_qc metrics #1046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 20, 2025
Merged

add hail sample_qc metrics #1046

merged 9 commits into from
Mar 20, 2025

Conversation

jklugherz
Copy link
Contributor

@jklugherz jklugherz commented Feb 14, 2025

adds hail's sample_qc metrics to sample_qc json

@jklugherz jklugherz requested a review from a team as a code owner February 14, 2025 22:15
@jklugherz jklugherz requested a review from matren395 February 14, 2025 22:17

def run_hail_sample_qc(mt: hl.MatrixTable, sample_type: SampleType) -> hl.MatrixTable:
mt = filter_to_autosomes(mt)
mt = hl.split_multi_hts(mt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this use the split_multi_hts wrapper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comes from the old script, but I'm thinking it might not be necessary since the mt is our validated callset on which we already call split_multi_hts

Copy link
Contributor

@matren395 matren395 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some quibbling abt annotation names

Comment on lines +44 to +51
HAIL_QC_METRICS = [
'n_snp',
'r_ti_tv',
'r_insertion_deletion',
'n_insertion',
'n_deletion',
'r_het_hom_var',
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only brief note: we know that all of these are used and relevant, right ?


strat_ht = mt.cols()
qc_metrics = {metric: strat_ht.sample_qc[metric] for metric in sample_qc_metrics}
strata = {'qc_pop': strat_ht.qc_pop}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
strata = {'qc_pop': strat_ht.qc_pop}
strata = {'gq_gen_anc': strat_ht. gq_gen_anc}

this is already annotated by then, yea ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup good find

return annotate_qc_gen_anc(mt)
mt = annotate_qc_gen_anc(mt)
mt = run_hail_sample_qc(mt, sample_type)
return mt.drop('qc_pop')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could probably do this sooner, unless u really want to run sample_qc while stratifying on only the default (90%) genetic ancestry inference

@jklugherz jklugherz requested a review from bpblanken March 18, 2025 15:21
@jklugherz jklugherz merged commit b14535f into sample-qc-qc_pop Mar 20, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants