-
Notifications
You must be signed in to change notification settings - Fork 20
Parameter refactoring + cleanup. #809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…e/seqr-loading-pipelines into benb/single_callset_path
…e/seqr-loading-pipelines into benb/single_callset_path
…s into benb/split_out_sample_type
…e/seqr-loading-pipelines into benb/single_callset_path
default=False, | ||
parsing=luigi.BoolParameter.EXPLICIT_PARSING, | ||
) | ||
skip_check_sex_and_relatedness = luigi.BoolParameter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some context on this:
- I wanted options that defaulted to
False
here, so that we didn't need to pass a flag for our default case in airflow (which we are doing now for s&r, which has led to some confusion). - I wanted to limit these to options that would either be manually specified while debugging a run or would be enabled dependent on sample source.
skip_
prefix for consistency.- There are also env vars for the two features that we don't want to enable for local users, so we wouldn't have to hardcode
skip_expect_filters=True
andskip_check_sex_and_relatedness=True
(though we could). - The "env var only" approach doesn't work well because env vars are set at the dataproc cluster level and I haven't been able to get job env vars to override cluster env vars correctly (one afternoon already wasted on this a few months ago).
if ( | ||
not Env.EXPECT_WES_FILTERS | ||
or not dataset_type.expect_filters(sample_type) | ||
or 'part_one_outputs' not in callset_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are 'part_one_outputs' and 'part_two_outputs'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matren395 can give some context here! I think it's just that internal exome callsets have always been delivered in two different folders, and we run an extra step to join the filters
part onto the first part!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I'd believe this is too, yes!
WES callsets are delivered in two parts usually, with two folders for Part_One_Outputs and Part_Two_Outputs. Part_One_Outputs are the "normal" sharded VCFs, one set of sharded VCFs per chromosome, complete with tons of annotations and information. Part_Two_Outputs is I believe just one VCF per chromosome, containing the sites to filter the Part_One VCFs to .
) | ||
is_new_gcnv_joint_call = luigi.BoolParameter( | ||
default=False, | ||
description='Is this a fully joint-called callset.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we also need parsing=luigi.BoolParameter.EXPLICIT_PARSING
here?
self.project_guids, | ||
self.project_remap_paths, | ||
self.project_pedigree_paths, | ||
self.imputed_sex_paths, | ||
strict=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strict=False
makes it so that if any of the 3 lists are of different length we'll never know - do we expect these to ever be of different lengths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah! I think this is worth changing. ruff
complains if you zip
without the param... and I do think it's impossible for the lists to not be the same length with the way it's been simplified now....but throwing an error is definitely clearer!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting about ruff too because it looks like the parameter is defaulted to False
…seqr-loading-pipelines into benb/single_callset_path
List of changes:
callset_paths
in favor ofcallset_path
.liftover_ref_path
param in favor of an env var.imputed_sex_path
andfilters_path
are computed and enabled via env vars (and optionally disabled via a "skip" param).