Skip to content

lookup sample_type migration #867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Aug 20, 2024
Merged

lookup sample_type migration #867

merged 24 commits into from
Aug 20, 2024

Conversation

jklugherz
Copy link
Contributor

@jklugherz jklugherz commented Aug 8, 2024

companion PR to handle new lookup table structure inside the pipeline: #868

@jklugherz jklugherz requested a review from bpblanken August 8, 2024 20:59
@@ -12,5 +12,9 @@ class BaseMigration(ABC):

@staticmethod
@abstractmethod
def migrate(ht: hl.Table) -> hl.Table:
def migrate(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😮

Base automatically changed from family-table-path-update to dev August 12, 2024 13:52
@jklugherz jklugherz marked this pull request as ready for review August 12, 2024 18:30
@jklugherz jklugherz requested a review from a team as a code owner August 12, 2024 18:30
@jklugherz jklugherz requested review from bpblanken and hanars August 14, 2024 18:41
remap_pedigree_hash: int32
}>
New Global fields:
'project_guids': array<tuple (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth renaming this field to something like project_sample_types to make it clearer what it is

break

# It is possible that there are projects in the lookup globals with no corresponding project ht.
# For those projects, set sample_type to missing and log project_guid, so we can delete them later.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a case that should be impossible. Would it be better for this to hard fail, and for us to first have to run any project deletions before we can run the migration?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this, hard fail makes sense and if there's something out of wack we can debug outside of this migration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I ran the migration without actually applying it with this check, here are projects that appear in project_guids but have no project tables:
MITO: R0566_tgg_ravenscroft_wes
SNV_INDEL 38: 52224_julia_test
SNV_INDEL 37: 52124_julia_test

They need to be removed from the lookup table.

  • However, I don't want to run the delete project dag for 52224_julia_test since it has a family in project_families (['F051011_fam2']) and I don't want that family table to be deleted as well!
  • R0566_tgg_ravenscroft_wes I'm not sure about. It also has families in project_families but the corresponding family tables don't exist.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes. R0566_tgg_ravenscroft_wes is my fault.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should run the delete families task... The family and project table didn't get copied over during the migration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R0566_tgg_ravenscroft_wes is gone from MITO
52124_julia_test is gone from SNV 37
still waiting on 52224_julia_test to be deleted, got a notebook running now in a cluster

@@ -44,7 +44,7 @@ def update_table(self, ht: hl.Table) -> hl.Table:
if (
(self.reference_genome, self.dataset_type)
) in migration.reference_genome_dataset_types:
ht = migration.migrate(ht)
ht = migration.migrate(ht, self.reference_genome, self.dataset_type)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can do **self.param_kwargs here. It might be better to just go ahead and generalize in that way?

def migrate(ht: hl.Table) -> hl.Table:
def migrate(
ht: hl.Table,
reference_genome: ReferenceGenome,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should also probably be **kwargs since it's a base.

Copy link
Collaborator

@bpblanken bpblanken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small thing but otherwise good!

@jklugherz jklugherz merged commit d68bdc4 into dev Aug 20, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants