Skip to content

Commit f89a6d3

Browse files
bpblankenjklugherzhanars
authored
Reference data refactor (#991)
* begin reference dataset refactor * hgmd * basewritetask * PR commentes * Reference data refactor feature branch * remove utils for now * cadd * hgmd selects * import * minor things * config enum attribute * config out of enum, get_ht, for_reference_genome_dataset_type * return table * kwargs * tiny changes * frozenset * cadd filtering * changes to the cadd script that will be moot? * add some gnomad datasets * hacking on clinvar * ruff * add 38 dbnsfp config * get cadd from dbnsfp * get primate ai and mpc from dbnsfp * Cleanup * cleanup * Update misc.py * Update clinvar.py * Update clinvar.py * Update clinvar_test.py * poach some files from bens pr * Update definitions.py * first pass enums * use liftover for 37 data instead of old version * remove cadd * Add clinvar path (#961) * Add clinvar path * Fix missing requires bug * remove dataset type from filter contigs * Move filter_contigs to "get_ht" so its generalizable * gnomad_exomes unit tests * all enum selects helper * gnomad_genomes tests * clean up * Generalize enum annotation * fix tempdir usage * add topmed * Benb/clinvar refactor (#960) * hacking on clinvar * ruff * Cleanup * cleanup * Update misc.py * Update clinvar.py * Update clinvar.py * Update clinvar_test.py * Update definitions.py * Add clinvar path (#961) * Add clinvar path * Fix missing requires bug * remove dataset type from filter contigs * Move filter_contigs to "get_ht" so its generalizable * Generalize enum annotation * Add back enum select fields * remove unnecessary line * clean up * ruff * wip hgmd test * ruff * share enum transmute * done * notebook * ruff * linter for now * first pass splice ai * Mitimpact * Add the enum 🤦 * bad typo * gnomad_mito, gnomad_non_coding_constraint, local_constraint_mito, screen * gnomad_qc typo * module_file_name * gnomad_genomes CONFIG deduplication * zipfile helper * MITIMPACT (#965) * Mitimpact * Add the enum 🤦 * bad typo * use helper for zip download * pr feedback * ruff * ruff * ruff * ruff * unshare extracted filename * clean up transmute * ruff * trailing comma * maybe clearer gnomad * fix property syntax * gnomad_mito selects * use hanas enum notation * shared import vcf helper * proper splice ai parsing * valid paths * ruff * ruff * mitomap * add coment * merge * screenums * explicit handling for already mapped enums * add tests * ruff * ruff * ruff * min_partitions * simplify mitomap * jupyter * hmtvar reference dataset (#971) * hmtvar reference dataset * ruff * eigen reference dataset (#970) * eigen reference dataset * Fix typo --------- Co-authored-by: Benjamin Blankenmeister <b.p.blankenmeister@gmail.com> * Exac reference dataset (#969) * add exac reference dataset * use vcf * remove comment --------- Co-authored-by: Benjamin Blankenmeister <b.p.blankenmeister@gmail.com> * helix mito (#972) * split genomes and exomes again * fix screen * screen and gnomad non coding * unzip local_constraint_mito * Fix bugs related to nested fields/split_multi (#973) * helix mito * Fix split_multi and select bugs * fixme * ruff * Add test for exac * Add test for split multi check * Add test for `UpdatedReferenceDataset` and `UpdatedReferenceDatasetQuery` (#974) * helix mito * Fix split_multi and select bugs * fixme * ruff * get test working * fix bugs * bug fixes * Bugfixes * Refactor tests * Add comment * quixotic * missed one * Add test for exac * Add test for split multi check * fix zip write * Benb/add missing queries (#977) * Add missing datasets * Fix reference * Add test * lint * remove complete() (#979) * remove complete() * ruff * Fix mock * Benb/update gnomad qc crdq with updated format (#980) * remove complete() * ruff * Fix mock * Replace the gnomad_qc crdq * Fix test * format * Remove ht and tests (#981) * remove complete() * ruff * Fix mock * Replace the gnomad_qc crdq * Fix test * format * Remove ht and tests * Updated `gnomad_coding_and_noncoding` test table. (#982) * remove complete() * ruff * Fix mock * Replace the gnomad_qc crdq * Fix test * format * Remove ht and tests * Change validation table reference * Update README.txt * remove crdq reference * Update mock * ruff * Fix imports * remove mock * fixme * Change rsync to new path (#983) * Remove `version` from reference dataset query path (#984) * Change rsync to new path * Remove version from reference dataset query path * Make rdq dataset type specific (#985) * Make rdq dataset type specific * Add test for mito * Add pathogenicities to clinvar * tweak * update annotations with updated reference datasets refactor (#978) * first pass update vat * merge feature * fix the diff for now * include_queries * interval ht * tests * exclude * nicer * fix inteval test * split fn * eigen test * clinvar wip * hgmd * clinvar * gnomad genomes and exomes * delete * 38 snv_indel done * mito tests * done with tests? * custom_select * fields test * disable write new samples tests for now * working on tests * update update vat with new samples tests * extra file * other skipped test * make select and filter similar * tweak * rename path and locus/interval filtering * make select and filter similar (#988) * make select and filter similar * tweak * Cleanest set diff * Finish off * Tests passing! * ruff * ruff * Change the params * Fix params * params * More clinvar mocking * hardcode these --------- Co-authored-by: Benjamin Blankenmeister <bblanken@broadinstitute.org> Co-authored-by: Benjamin Blankenmeister <b.p.blankenmeister@gmail.com> * delete old reference data code 😝 (#990) * first pass update vat * merge feature * fix the diff for now * include_queries * interval ht * tests * exclude * nicer * fix inteval test * split fn * eigen test * clinvar wip * hgmd * clinvar * gnomad genomes and exomes * delete * 38 snv_indel done * mito tests * done with tests? * custom_select * fields test * disable write new samples tests for now * working on tests * update update vat with new samples tests * extra file * other skipped test * make select and filter similar * tweak * rename path and locus/interval filtering * make select and filter similar (#988) * make select and filter similar * tweak * Cleanest set diff * Finish off * Tests passing! * ruff * ruff * Change the params * Fix params * params * More clinvar mocking * hardcode these * delete a bunch of stuff * ruff * remove rdc and crdq * delete v02 * remove comment references to deleted file * last test --------- Co-authored-by: Benjamin Blankenmeister <bblanken@broadinstitute.org> Co-authored-by: Benjamin Blankenmeister <b.p.blankenmeister@gmail.com> --------- Co-authored-by: Julia Klugherz <juliaklugherz@gmail.com> Co-authored-by: Hana Snow <hsnow@broadinstitute.org>
1 parent b3e996a commit f89a6d3

File tree

895 files changed

+4008
-6890
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

895 files changed

+4008
-6890
lines changed

download_and_create_reference_datasets/v02/create_ht__cadd.py

Lines changed: 0 additions & 8 deletions
This file was deleted.

download_and_create_reference_datasets/v02/create_ht__clinvar.py

Lines changed: 0 additions & 8 deletions
This file was deleted.

download_and_create_reference_datasets/v02/create_ht__combined_reference_data.py

Lines changed: 0 additions & 14 deletions
This file was deleted.

download_and_create_reference_datasets/v02/create_ht__eigen.py

Lines changed: 0 additions & 14 deletions
This file was deleted.

download_and_create_reference_datasets/v02/create_ht__mpc.py

Lines changed: 0 additions & 14 deletions
This file was deleted.

download_and_create_reference_datasets/v02/create_ht__primate_ai.py

Lines changed: 0 additions & 14 deletions
This file was deleted.

download_and_create_reference_datasets/v02/create_ht__topmed.py

Lines changed: 0 additions & 14 deletions
This file was deleted.

download_and_create_reference_datasets/v02/hail_scripts/write_1kg_ht.py

Lines changed: 0 additions & 71 deletions
This file was deleted.

download_and_create_reference_datasets/v02/hail_scripts/write_cadd_ht.py

Lines changed: 0 additions & 49 deletions
This file was deleted.

download_and_create_reference_datasets/v02/hail_scripts/write_ccREs_ht.py

Lines changed: 0 additions & 59 deletions
This file was deleted.

0 commit comments

Comments
 (0)