Skip to content

Support unknowns in pedigree #795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion v03_pipeline/lib/misc/family_loading_failures.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,10 @@ def get_families_failed_sex_check(
failed_families = defaultdict(list)
for family in families:
for sample_id in family.samples:
if family.samples[sample_id].sex != sex_check_lookup[sample_id]:
if family.samples[sample_id].sex not in {
sex_check_lookup[sample_id],
Sex.UNKNOWN,
}: # NB: Unknown samples in pedigree are excluded from sex check.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know what Unknowns look like in our DSP/GP friends' returns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remind me: so the sex_check_lookup is what our DSP/GP friends are delivering, right. So if a given sample is 'Unknown' in Seqr, it doesn't matter what it's returned as in our DSP/GP friends' metrics.tsv file thing.

Would we want to try and ensure that it's also returned as 'U' by them? But once again, I don't know how that would look like returned

This does still fail of the family.samples[sample_id].sex is M/F and sex_check_lookup[sample_id] is F/M, or vice-versa.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my thinking on this was that a U in the pedigree is quite distinct from a U in the imputed sex file and that they aren't actually comparable. The former is a project management issue whereas the latter is likely a statistical issue within DRAGEN.

I don't actually know how a U would be delivered in the imputed sex file, so we were, on purpose, extremely strict with the import of that column.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heard! In that case, lgtm

failed_families[family].append(
f'Sample {sample_id} has pedigree sex {family.samples[sample_id].sex.value} but imputed sex {sex_check_lookup[sample_id].value}',
)
Expand Down
1 change: 0 additions & 1 deletion v03_pipeline/lib/misc/family_loading_failures_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,6 @@ def test_get_families_failed_sex_check(self):
[
[
'Sample ROS_006_18Y03226_D1 has pedigree sex F but imputed sex M',
'Sample ROS_006_18Y03227_D1 has pedigree sex M but imputed sex F',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic behind this change... ROS_006_18Y03227_D1 is now denoted as U in the test pedigree, so the family fails for only the 'Sample ROS_006_18Y03226_D1 has pedigree sex F but imputed sex M' reason.

],
],
)
2 changes: 1 addition & 1 deletion v03_pipeline/lib/misc/pedigree_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -316,7 +316,7 @@ def test_parse_project(self) -> None:
samples={
'BBL_BC1-000345_01_D1': Sample(
sample_id='BBL_BC1-000345_01_D1',
sex=Sex.FEMALE,
sex=Sex.UNKNOWN,
mother='BBL_BC1-000345_03_D1',
father='BBL_BC1-000345_02_D1',
maternal_grandmother=None,
Expand Down
1 change: 1 addition & 0 deletions v03_pipeline/lib/model/definitions.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ class AccessControl(Enum):
class Sex(Enum):
FEMALE = 'F'
MALE = 'M'
UNKNOWN = 'U'


class PipelineVersion(Enum):
Expand Down
2 changes: 1 addition & 1 deletion v03_pipeline/var/test/pedigrees/test_pedigree_2.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,6 @@ R0111_tgg_bblanken_wes BBL_HT-007-5195_1 BBL_HT-007-5195 BBL_HT-007-5195_03_D1
R0111_tgg_bblanken_wes BBL_HT-007-5195_1 BBL_HT-007-5195 BBL_HT-007-5195_04_D1 BBL_HT-007-5195_02_D1 BBL_HT-007-5195_03_D1 M
R0111_tgg_bblanken_wes BBL_HT-007-5195_1 BBL_HT-007-5195 BBL_HT-007-5195_05_D1 BBL_HT-007-5195_02_D1 BBL_HT-007-5195_03_D1 F
R0111_tgg_bblanken_wes BBL_HT-007-5195_1 BBL_HT-007-5195 BBL_HT-007-5195_06_D1 BBL_HT-007-5195_02_D1 BBL_HT-007-5195_03_D1 M
R0111_tgg_bblanken_wes BBL_BC1-000345_1 BBL_BC1-000345 BBL_BC1-000345_01_D1 BBL_BC1-000345_02_D1 BBL_BC1-000345_03_D1 F
R0111_tgg_bblanken_wes BBL_BC1-000345_1 BBL_BC1-000345 BBL_BC1-000345_01_D1 BBL_BC1-000345_02_D1 BBL_BC1-000345_03_D1 U
R0111_tgg_bblanken_wes BBL_BC1-000345_1 BBL_BC1-000345 BBL_BC1-000345_02_D1 M
R0111_tgg_bblanken_wes BBL_BC1-000345_1 BBL_BC1-000345 BBL_BC1-000345_03_D1 F
2 changes: 1 addition & 1 deletion v03_pipeline/var/test/pedigrees/test_pedigree_6.tsv
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Project_GUID Family_GUID Family_ID Individual_ID Paternal_ID Maternal_ID Sex
R0116_sex_check_project2 family_1 family_1 ROS_006_18Y03226_D1 F
R0116_sex_check_project2 family_1 family_1 ROS_006_18Y03227_D1 M
R0116_sex_check_project2 family_1 family_1 ROS_006_18Y03227_D1 U
R0116_sex_check_project2 family_1 family_1 ROS_006_18Y03228_D1 ROS_006_18Y03226_D1 ROS_006_18Y03227_D1 F
R0116_sex_check_project2 family_2 family_2 ROS_007_19Y05919_D1 F
R0116_sex_check_project2 family_2 family_2 ROS_007_19Y05939_D1 F
Expand Down
Loading