-
Notifications
You must be signed in to change notification settings - Fork 20
lookup sample_type migration #867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -12,5 +12,9 @@ class BaseMigration(ABC): | |||
|
|||
@staticmethod | |||
@abstractmethod | |||
def migrate(ht: hl.Table) -> hl.Table: | |||
def migrate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😮
…s into lookup-sampletype-migration
remap_pedigree_hash: int32 | ||
}> | ||
New Global fields: | ||
'project_guids': array<tuple ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth renaming this field to something like project_sample_types
to make it clearer what it is
break | ||
|
||
# It is possible that there are projects in the lookup globals with no corresponding project ht. | ||
# For those projects, set sample_type to missing and log project_guid, so we can delete them later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a case that should be impossible. Would it be better for this to hard fail, and for us to first have to run any project deletions before we can run the migration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this, hard fail makes sense and if there's something out of wack we can debug outside of this migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I ran the migration without actually applying it with this check, here are projects that appear in project_guids but have no project tables:
MITO: R0566_tgg_ravenscroft_wes
SNV_INDEL 38: 52224_julia_test
SNV_INDEL 37: 52124_julia_test
They need to be removed from the lookup table.
- However, I don't want to run the delete project dag for 52224_julia_test since it has a family in project_families (['F051011_fam2']) and I don't want that family table to be deleted as well!
- R0566_tgg_ravenscroft_wes I'm not sure about. It also has families in project_families but the corresponding family tables don't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes. R0566_tgg_ravenscroft_wes
is my fault.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should run the delete
families task... The family and project table didn't get copied over during the migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R0566_tgg_ravenscroft_wes is gone from MITO
52124_julia_test is gone from SNV 37
still waiting on 52224_julia_test to be deleted, got a notebook running now in a cluster
@@ -44,7 +44,7 @@ def update_table(self, ht: hl.Table) -> hl.Table: | |||
if ( | |||
(self.reference_genome, self.dataset_type) | |||
) in migration.reference_genome_dataset_types: | |||
ht = migration.migrate(ht) | |||
ht = migration.migrate(ht, self.reference_genome, self.dataset_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can do **self.param_kwargs
here. It might be better to just go ahead and generalize in that way?
def migrate(ht: hl.Table) -> hl.Table: | ||
def migrate( | ||
ht: hl.Table, | ||
reference_genome: ReferenceGenome, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one should also probably be **kwargs
since it's a base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small thing but otherwise good!
companion PR to handle new lookup table structure inside the pipeline: #868