Skip to content

Benb/migration task #834

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 47 commits into from
Jul 25, 2024
Merged

Benb/migration task #834

merged 47 commits into from
Jul 25, 2024

Conversation

bpblanken
Copy link
Collaborator

@bpblanken bpblanken commented Jul 9, 2024

I went back and forth on a few things on this late last week, but it's ready now!

Base automatically changed from benb/split_import_and_validate to dev July 9, 2024 21:18
from v03_pipeline.lib.model import DatasetType, ReferenceGenome


class BaseMigration(ABC):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Template for a migration:

  • Define the applicable set of reference genome and dataset types.
  • A function that migrates.

def requires(self) -> luigi.Task | None:
# Require the previous migration
defined_migrations = [x[0] for x in list_migrations(self.migrations_path)]
for i, migration in enumerate(defined_migrations):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a late addition... each migration (expect the first) requires the previous migration to be complete.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MigrateAllVariantAnnotationsTablesTask and MigrateLookupTableTask tasks create implementations of BaseMigrateTask for each migration in the migrations directories - and then each of those tasks recursively creates migration task requirements here. Is that a problem, or does luigi cache/(memoize?) the result of identical tasks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the hope was that each task would only create one additional task, the i - 1th one. And then the i - 1th one would require the i - 2nd one, leading to 2 copies of the whole set of migrations in memory (I think). Luigi does cache though.

)

@staticmethod
def migrate(ht: hl.Table) -> hl.Table:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first migration that will add a migrations list to the globals.

@bpblanken bpblanken marked this pull request as ready for review July 18, 2024 14:03
@bpblanken bpblanken requested a review from a team as a code owner July 18, 2024 14:03
self.assertEqual(
list_migrations(self.tmpdir.name),
[
('0000_migration', ANY),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the second item in the tuple returned by list_migrations? is it possible to assert on something more specific here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's an implementation of the BaseMigration class. I can make it more specific for sure!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jklugherz I have no idea if this was the best fix but it is fixed!

@bpblanken bpblanken merged commit ffa4313 into dev Jul 25, 2024
3 checks passed
@bpblanken bpblanken deleted the benb/migration_task branch July 25, 2024 20:21
bpblanken added a commit that referenced this pull request Aug 2, 2024
* Move vep files (#844)

* Add mito local constraint (#845)

* Add mito local constraint

* Fix tests

* lint

* Benb/migration task (#834)

* split import and validate

* lint and share function

* ruff

* change dep

* tweak update

* lint

* wrong method

* correct method

* mocks

* change sample type annotation on test

* hack on migration

* sort return list

* move the migration

* still hacking

* better!

* getting there

* Cleaner

* ruff

* Finish it off

* migration

* rename var

* add migrations to annotations table

* fix test import

* actually fix the test

* add migrations

* not used here

* use globals

* missed one

* a hilarious typo

* Update migrate_variant_annotations_table.py

* correct sign

* add lookup migration

* Add lookup table migration

* adjust migration

* ruff

* Add to tasks

* ensure a migration cannot run before a previous migration!

* ruff

* fix bug

* lint

* add referencegenomedatasetype

* Annoying but fixed

* Add new SV annotations for VCF export. (#857)

* Add SV annotations

* ruff

* push

* ruff

* Update update_variant_annotations_table_with_new_samples_test.py

* Add a task to export the SV annotations table to VCF. (#858)

* Export VCF task

* Fix test

* lint

* Resolve the assumption in the pipeline that remap/pedigree files are immutable. (#856)

* add remap_pedigree hash

* add func

* all the imports

* ruff

* Fix it

* support missing remap

* ruff

* ruff

* ruff

* tweak the type

* tweak the type

* Fix test

* ruff

* add remap pedigree hash

* Explicit int32

* lint

* Update io.py

* ruff

* lint

* hash

* Flappy test

* wrong pedigree

* bad colon

* finish tests

* add a test

* add pedigree

* Fix test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants