Skip to content

add sample qc task and filter_flags #1034

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Mar 17, 2025
Merged

Conversation

jklugherz
Copy link
Contributor

@jklugherz jklugherz commented Feb 6, 2025

adds sample_qc luigi task with a single metric - filter_flags - that outputs a json file, makes it a dependency of the write callset task, and adds the sample qc json to the metadata json.

@jklugherz jklugherz requested a review from a team as a code owner February 6, 2025 21:48
@jklugherz jklugherz requested a review from bpblanken February 11, 2025 16:18
@jklugherz jklugherz changed the title add sample qc task and first metric - filtered_callrate add sample qc task and filter_flags Feb 12, 2025
@jklugherz jklugherz requested a review from bpblanken February 13, 2025 17:56


class WriteSampleQCJsonTaskTest(MockedDatarootTestCase):
@patch('v03_pipeline.lib.tasks.write_sample_qc_json.WriteTDRMetricsFilesTask')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not able to figure out the bigquery function mock contamination that was happening when this test was run with WriteSexCheckTableTaskTest, so I just mocked the entire WriteTDRMetricsFilesTask 🤷

@jklugherz jklugherz requested a review from matren395 February 14, 2025 22:17
sample_qc_dict = defaultdict(dict)
for row in ht.flatten().collect():
r = dict(row)
sample_id = r.pop('s')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be cleaner as

for field, value in r.items():
   sample_qc_dict[r.pop('s')][field] = value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idt this is possible RuntimeError: dictionary changed size during iteration

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jklugherz jklugherz requested a review from bpblanken February 20, 2025 16:41
@@ -244,6 +244,17 @@ def import_imputed_sex(imputed_sex_path: str) -> hl.Table:
return ht.key_by(ht.s)


def import_tdr_qc_metrics(file_path: str) -> hl.Table:
ht = hl.import_table(file_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there’s a way to define the types for non-strings at import time, we should try that!

dataset_type,
),
'sample_qc',
f'{hashlib.sha256(callset_path.encode("utf8")).hexdigest()}.json',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s the new “callset_path_hash” function that snuck in after this was started. We can use it here!

Copy link
Collaborator

@bpblanken bpblanken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thank you!

@jklugherz jklugherz merged commit 48537ac into dev Mar 17, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants