-
Notifications
You must be signed in to change notification settings - Fork 21
add sample qc task and filter_flags #1034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
|
||
class WriteSampleQCJsonTaskTest(MockedDatarootTestCase): | ||
@patch('v03_pipeline.lib.tasks.write_sample_qc_json.WriteTDRMetricsFilesTask') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not able to figure out the bigquery function mock contamination that was happening when this test was run with WriteSexCheckTableTaskTest
, so I just mocked the entire WriteTDRMetricsFilesTask
🤷
sample_qc_dict = defaultdict(dict) | ||
for row in ht.flatten().collect(): | ||
r = dict(row) | ||
sample_id = r.pop('s') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be cleaner as
for field, value in r.items():
sample_qc_dict[r.pop('s')][field] = value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idt this is possible RuntimeError: dictionary changed size during iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
v03_pipeline/lib/misc/io.py
Outdated
@@ -244,6 +244,17 @@ def import_imputed_sex(imputed_sex_path: str) -> hl.Table: | |||
return ht.key_by(ht.s) | |||
|
|||
|
|||
def import_tdr_qc_metrics(file_path: str) -> hl.Table: | |||
ht = hl.import_table(file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there’s a way to define the types for non-strings at import time, we should try that!
v03_pipeline/lib/paths.py
Outdated
dataset_type, | ||
), | ||
'sample_qc', | ||
f'{hashlib.sha256(callset_path.encode("utf8")).hexdigest()}.json', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There’s the new “callset_path_hash” function that snuck in after this was started. We can use it here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! thank you!
adds sample_qc luigi task with a single metric - filter_flags - that outputs a json file, makes it a dependency of the write callset task, and adds the sample qc json to the metadata json.