-
Notifications
You must be signed in to change notification settings - Fork 21
Benb/migration task #834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benb/migration task #834
Conversation
…es into benb/split_import_and_validate
…stitute/seqr-loading-pipelines into benb/migration_task
…s into benb/migration_task
…ines into benb/migration_task
from v03_pipeline.lib.model import DatasetType, ReferenceGenome | ||
|
||
|
||
class BaseMigration(ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Template for a migration:
- Define the applicable set of reference genome and dataset types.
- A function that migrates.
def requires(self) -> luigi.Task | None: | ||
# Require the previous migration | ||
defined_migrations = [x[0] for x in list_migrations(self.migrations_path)] | ||
for i, migration in enumerate(defined_migrations): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a late addition... each migration (expect the first) requires the previous migration to be complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MigrateAllVariantAnnotationsTablesTask and MigrateLookupTableTask tasks create implementations of BaseMigrateTask for each migration in the migrations directories - and then each of those tasks recursively creates migration task requirements here. Is that a problem, or does luigi cache/(memoize?) the result of identical tasks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the hope was that each task would only create one additional task, the i - 1
th one. And then the i - 1
th one would require the i - 2
nd one, leading to 2 copies of the whole set of migrations in memory (I think). Luigi does cache though.
) | ||
|
||
@staticmethod | ||
def migrate(ht: hl.Table) -> hl.Table: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A first migration that will add a migrations
list to the globals.
self.assertEqual( | ||
list_migrations(self.tmpdir.name), | ||
[ | ||
('0000_migration', ANY), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the second item in the tuple returned by list_migrations? is it possible to assert on something more specific here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's an implementation of the BaseMigration
class. I can make it more specific for sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jklugherz I have no idea if this was the best fix but it is fixed!
…s into benb/migration_task
* Move vep files (#844) * Add mito local constraint (#845) * Add mito local constraint * Fix tests * lint * Benb/migration task (#834) * split import and validate * lint and share function * ruff * change dep * tweak update * lint * wrong method * correct method * mocks * change sample type annotation on test * hack on migration * sort return list * move the migration * still hacking * better! * getting there * Cleaner * ruff * Finish it off * migration * rename var * add migrations to annotations table * fix test import * actually fix the test * add migrations * not used here * use globals * missed one * a hilarious typo * Update migrate_variant_annotations_table.py * correct sign * add lookup migration * Add lookup table migration * adjust migration * ruff * Add to tasks * ensure a migration cannot run before a previous migration! * ruff * fix bug * lint * add referencegenomedatasetype * Annoying but fixed * Add new SV annotations for VCF export. (#857) * Add SV annotations * ruff * push * ruff * Update update_variant_annotations_table_with_new_samples_test.py * Add a task to export the SV annotations table to VCF. (#858) * Export VCF task * Fix test * lint * Resolve the assumption in the pipeline that remap/pedigree files are immutable. (#856) * add remap_pedigree hash * add func * all the imports * ruff * Fix it * support missing remap * ruff * ruff * ruff * tweak the type * tweak the type * Fix test * ruff * add remap pedigree hash * Explicit int32 * lint * Update io.py * ruff * lint * hash * Flappy test * wrong pedigree * bad colon * finish tests * add a test * add pedigree * Fix test
I went back and forth on a few things on this late last week, but it's ready now!