Skip to content

Commit c375840

Browse files
Gnomad v4 (#647)
* gnomad v4 * handle nested exome schema * lint * Bump aiohttp from 3.9.2 to 3.9.4 (#766) Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.9.2 to 3.9.4. - [Release notes](https://github.com/aio-libs/aiohttp/releases) - [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst) - [Commits](aio-libs/aiohttp@v3.9.2...v3.9.4) --- updated-dependencies: - dependency-name: aiohttp dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Dev (#773) * Delete "family" tasks (#761) * Delete project tasks * cleanup * ruff format * well * rename * hacking away * almost there! * ruff * Fix missing updates change * ruff * Remove debug code * remove bad merge * more precision in test * project table * allow for missing project * remove some unnecessary checks * test already deleted family * add comment * Delete Project & Family Table Tasks. (#767) * Delete project tasks * cleanup * ruff format * well * rename * hacking away * almost there! * ruff * Fix missing updates change * ruff * Remove debug code * remove bad merge * more precision in test * project table * allow for missing project * remove some unnecessary checks * test already deleted family * Lots of renames * More updates * Sketch * Flesh out test * fix paths * Rename base hail table * a bunch more renames * delete project table * Add delete project families * add comment * test it! * Fix * add dep * some missing tasks * [optimization] read family tables directly from project table. (#769) * Delete project tasks * cleanup * ruff format * well * rename * hacking away * almost there! * ruff * Fix missing updates change * ruff * Remove debug code * remove bad merge * more precision in test * project table * allow for missing project * remove some unnecessary checks * test already deleted family * Lots of renames * More updates * Sketch * Flesh out test * fix paths * Rename base hail table * a bunch more renames * delete project table * Add delete project families * is it that simple? * add comment * test it! * Fix * add dep * Ensure rows are deleted after deleting samples! (#770) * Delete project tasks * cleanup * ruff format * well * rename * hacking away * almost there! * ruff * Fix missing updates change * ruff * Remove debug code * remove bad merge * more precision in test * project table * allow for missing project * remove some unnecessary checks * test already deleted family * Lots of renames * More updates * Sketch * Flesh out test * fix paths * Rename base hail table * a bunch more renames * delete project table * Add delete project families * add comment * test it! * Fix * add dep * Lookup table filtering * Ensure rows with no projects/families defined are removed * ruff * remove mock * Remove mocks from args * tweak tests * VEP 110 docker image and dataproc init script (#758) * Add VEP docker image * simplify * bump version * Add cloudbuild * first pass * a bit of cleanup * ws * ws * A few tweaks * twiddle options * Bunch of config * working! * Update vep-GRCh38.json * Update vep-110-GRCh38.sh * missing slash * more VEP * some vep cleanup * Remove genesplicer * Bump to 4.1 * mock * bump versions * version * underscore * grpmax * I think this fixes? * index dict * update globals * a few missing ones * no tweaks to 37 * Fix table * lint * fixmeg * update me * Update update_variant_annotations_table_with_updated_reference_dataset_test.py * ruff * Update base_update_variant_annotations_table.py --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
1 parent 8501255 commit c375840

18 files changed

+95
-62
lines changed

requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
#
77
aiodns==2.0.0
88
# via hail
9-
aiohttp==3.9.2
9+
aiohttp==3.9.4
1010
# via
1111
# hail
1212
# slackclient

v03_pipeline/lib/reference_data/config.py

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -127,9 +127,9 @@ def custom_gnomad_select_v2(ht):
127127
return selects
128128

129129

130-
def custom_gnomad_select_v3(ht):
130+
def custom_gnomad_select_v4(ht):
131131
"""
132-
Custom select for public gnomad v3 dataset (which we did not generate). Extracts fields like
132+
Custom select for public gnomad v4 dataset (which we did not generate). Extracts fields like
133133
'AF', 'AN', and generates 'hemi'.
134134
:param ht: hail table
135135
:return: select expression dict
@@ -141,14 +141,15 @@ def custom_gnomad_select_v3(ht):
141141
selects['AC'] = ht.freq[global_idx].AC
142142
selects['Hom'] = ht.freq[global_idx].homozygote_count
143143

144+
grpmax_af = ht.grpmax['gnomad'].AF if hasattr(ht.grpmax, 'gnomad') else ht.grpmax.AF
144145
selects['AF_POPMAX_OR_GLOBAL'] = hl.float32(
145-
hl.or_else(ht.popmax.AF, ht.freq[global_idx].AF),
146+
hl.or_else(grpmax_af, ht.freq[global_idx].AF),
146147
)
147148
selects['FAF_AF'] = hl.float32(ht.faf[ht.globals.faf_index_dict['adj']].faf95)
148149
selects['Hemi'] = hl.if_else(
149150
ht.locus.in_autosome_or_par(),
150151
0,
151-
ht.freq[ht.globals.freq_index_dict['XY-adj']].AC,
152+
ht.freq[ht.globals.freq_index_dict['XY_adj']].AC,
152153
)
153154
return selects
154155

@@ -360,9 +361,9 @@ def custom_mpc_select(ht):
360361
'custom_select': custom_gnomad_select_v2,
361362
},
362363
'38': {
363-
'version': 'r2.1.1',
364-
'path': 'gs://gcp-public-data--gnomad/release/2.1.1/liftover_grch38/ht/exomes/gnomad.exomes.r2.1.1.sites.liftover_grch38.ht',
365-
'custom_select': custom_gnomad_select_v2,
364+
'version': 'v4.1',
365+
'path': 'gs://gcp-public-data--gnomad/release/4.1/ht/exomes/gnomad.exomes.v4.1.sites.ht',
366+
'custom_select': custom_gnomad_select_v4,
366367
},
367368
},
368369
'gnomad_genomes': {
@@ -372,9 +373,9 @@ def custom_mpc_select(ht):
372373
'custom_select': custom_gnomad_select_v2,
373374
},
374375
'38': {
375-
'version': 'v3.1.2',
376-
'path': 'gs://gcp-public-data--gnomad/release/3.1.2/ht/genomes/gnomad.genomes.v3.1.2.sites.ht',
377-
'custom_select': custom_gnomad_select_v3,
376+
'version': 'v4.1',
377+
'path': 'gs://gcp-public-data--gnomad/release/4.1/ht/genomes/gnomad.genomes.v4.1.sites.ht',
378+
'custom_select': custom_gnomad_select_v4,
378379
},
379380
},
380381
'gnomad_qc': {

v03_pipeline/lib/tasks/reference_data/update_variant_annotations_table_with_updated_reference_dataset_test.py

Lines changed: 76 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -115,40 +115,6 @@
115115
globals=hl.Struct(),
116116
),
117117
}
118-
MOCK_GNOMAD_EXOMES_CONFIG = {
119-
**CONFIG['gnomad_exomes']['38'],
120-
'source_path': 'gs://gcp-public-data--gnomad/release/2.1.1/ht/exomes/gnomad.exomes.r2.1.1.sites.ht',
121-
'custom_import': lambda *_: hl.Table.parallelize(
122-
[],
123-
hl.tstruct(
124-
locus=hl.tlocus('GRCh38'),
125-
alleles=hl.tarray(hl.tstr),
126-
freq=hl.tarray(
127-
hl.tstruct(
128-
AF=hl.tfloat64,
129-
AN=hl.tint32,
130-
AC=hl.tint32,
131-
homozygote_count=hl.tint32,
132-
),
133-
),
134-
popmax=hl.tarray(
135-
hl.tstruct(
136-
AF=hl.tfloat64,
137-
AN=hl.tint32,
138-
AC=hl.tint32,
139-
homozygote_count=hl.tint32,
140-
pop=hl.tstr,
141-
),
142-
),
143-
faf=hl.tarray(hl.tstruct(faf95=hl.tfloat64)),
144-
),
145-
key=['locus', 'alleles'],
146-
globals=hl.Struct(
147-
freq_index_dict={'gnomad': 0, 'gnomad_male': 1},
148-
popmax_index_dict={'gnomad': 0},
149-
),
150-
),
151-
}
152118

153119
MOCK_MPC_CONFIG = {
154120
**CONFIG['mpc']['38'],
@@ -286,8 +252,74 @@
286252
'38': MOCK_EXAC_CONFIG,
287253
},
288254
'gnomad_exomes': {
289-
'37': MOCK_GNOMAD_EXOMES_CONFIG,
290-
'38': MOCK_GNOMAD_EXOMES_CONFIG,
255+
'37': {
256+
**CONFIG['gnomad_exomes']['37'],
257+
'source_path': 'gs://gcp-public-data--gnomad/release/2.1.1/ht/exomes/gnomad.exomes.r2.1.1.sites.ht',
258+
'custom_import': lambda *_: hl.Table.parallelize(
259+
[],
260+
hl.tstruct(
261+
locus=hl.tlocus('GRCh38'),
262+
alleles=hl.tarray(hl.tstr),
263+
freq=hl.tarray(
264+
hl.tstruct(
265+
AF=hl.tfloat64,
266+
AN=hl.tint32,
267+
AC=hl.tint32,
268+
homozygote_count=hl.tint32,
269+
),
270+
),
271+
popmax=hl.tarray(
272+
hl.tstruct(
273+
AF=hl.tfloat64,
274+
AN=hl.tint32,
275+
AC=hl.tint32,
276+
homozygote_count=hl.tint32,
277+
pop=hl.tstr,
278+
),
279+
),
280+
faf=hl.tarray(hl.tstruct(faf95=hl.tfloat64)),
281+
),
282+
key=['locus', 'alleles'],
283+
globals=hl.Struct(
284+
freq_index_dict={'gnomad': 0, 'gnomad_male': 1},
285+
popmax_index_dict={'gnomad': 0},
286+
),
287+
),
288+
},
289+
'38': {
290+
**CONFIG['gnomad_exomes']['38'],
291+
'source_path': 'gs://gcp-public-data--gnomad/release/4.1/ht/exomes/gnomad.exomes.v4.1.sites.ht',
292+
'custom_import': lambda *_: hl.Table.parallelize(
293+
[],
294+
hl.tstruct(
295+
locus=hl.tlocus('GRCh38'),
296+
alleles=hl.tarray(hl.tstr),
297+
freq=hl.tarray(
298+
hl.tstruct(
299+
AF=hl.tfloat64,
300+
AN=hl.tint32,
301+
AC=hl.tint32,
302+
homozygote_count=hl.tint32,
303+
),
304+
),
305+
grpmax=hl.tstruct(
306+
gnomad=hl.tstruct(
307+
AF=hl.tfloat64,
308+
AN=hl.tint32,
309+
AC=hl.tint32,
310+
homozygote_count=hl.tint32,
311+
pop=hl.tstr,
312+
),
313+
),
314+
faf=hl.tarray(hl.tstruct(faf95=hl.tfloat64)),
315+
),
316+
key=['locus', 'alleles'],
317+
globals=hl.Struct(
318+
freq_index_dict={'adj': 0, 'XY_adj': 1},
319+
faf_index_dict={'adj': 0},
320+
),
321+
),
322+
},
291323
},
292324
'gnomad_genomes': {
293325
'37': {
@@ -326,8 +358,8 @@
326358
},
327359
'38': {
328360
**CONFIG['gnomad_genomes']['38'],
329-
'version': 'r2.1.1',
330-
'source_path': 'gs://gcp-public-data--gnomad/release/2.1.1/ht/genomes/gnomad.genomes.r2.1.1.sites.ht',
361+
'version': 'v4.1',
362+
'source_path': 'gs://gcp-public-data--gnomad/release/4.1/ht/genomes/gnomad.genomes.v4.1.sites.ht',
331363
'custom_import': lambda *_: hl.Table.parallelize(
332364
[],
333365
hl.tstruct(
@@ -341,7 +373,7 @@
341373
homozygote_count=hl.tint32,
342374
),
343375
),
344-
popmax=hl.tstruct(
376+
grpmax=hl.tstruct(
345377
AF=hl.tfloat64,
346378
AN=hl.tint32,
347379
AC=hl.tint32,
@@ -352,7 +384,7 @@
352384
),
353385
key=['locus', 'alleles'],
354386
globals=hl.Struct(
355-
freq_index_dict={'adj': 0, 'XY-adj': 1},
387+
freq_index_dict={'adj': 0, 'XY_adj': 1},
356388
faf_index_dict={'adj': 0},
357389
),
358390
),
@@ -778,8 +810,8 @@ def test_update_vat_with_updated_rdc_snv_indel_38(
778810
dbnsfp='gs://seqr-reference-data/GRCh37/dbNSFP/v2.9.3/dbNSFP2.9.3_variant.ht',
779811
eigen='gs://seqr-reference-data/GRCh37/eigen/EIGEN_coding_noncoding.grch37.ht',
780812
exac='gs://seqr-reference-data/GRCh37/gnomad/ExAC.r1.sites.vep.ht',
781-
gnomad_exomes='gs://gcp-public-data--gnomad/release/2.1.1/ht/exomes/gnomad.exomes.r2.1.1.sites.ht',
782-
gnomad_genomes='gs://gcp-public-data--gnomad/release/2.1.1/ht/genomes/gnomad.genomes.r2.1.1.sites.ht',
813+
gnomad_exomes='gs://gcp-public-data--gnomad/release/4.1/ht/exomes/gnomad.exomes.v4.1.sites.ht',
814+
gnomad_genomes='gs://gcp-public-data--gnomad/release/4.1/ht/genomes/gnomad.genomes.v4.1.sites.ht',
783815
mpc='gs://seqr-reference-data/GRCh37/MPC/fordist_constraint_official_mpc_values.ht',
784816
primate_ai='gs://seqr-reference-data/GRCh37/primate_ai/PrimateAI_scores_v0.2.ht',
785817
splice_ai='gs://seqr-reference-data/GRCh37/spliceai/spliceai_scores.ht',
@@ -794,8 +826,8 @@ def test_update_vat_with_updated_rdc_snv_indel_38(
794826
dbnsfp='2.9.3',
795827
eigen=None,
796828
exac=None,
797-
gnomad_exomes='r2.1.1',
798-
gnomad_genomes='r2.1.1',
829+
gnomad_exomes='v4.1',
830+
gnomad_genomes='v4.1',
799831
mpc=None,
800832
primate_ai='v0.2',
801833
splice_ai=None,

v03_pipeline/lib/tasks/update_variant_annotations_table_with_new_samples_test.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -470,8 +470,8 @@ def test_multiple_update_vat(
470470
dbnsfp='gs://seqr-reference-data/GRCh37/dbNSFP/v2.9.3/dbNSFP2.9.3_variant.ht',
471471
eigen='gs://seqr-reference-data/GRCh37/eigen/EIGEN_coding_noncoding.grch37.ht',
472472
exac='gs://seqr-reference-data/GRCh37/gnomad/ExAC.r1.sites.vep.ht',
473-
gnomad_exomes='gs://gcp-public-data--gnomad/release/2.1.1/ht/exomes/gnomad.exomes.r2.1.1.sites.ht',
474-
gnomad_genomes='gs://gcp-public-data--gnomad/release/2.1.1/ht/genomes/gnomad.genomes.r2.1.1.sites.ht',
473+
gnomad_exomes='gs://gcp-public-data--gnomad/release/4.1/ht/exomes/gnomad.exomes.v4.1.sites.ht',
474+
gnomad_genomes='gs://gcp-public-data--gnomad/release/4.1/ht/genomes/gnomad.genomes.v4.1.sites.ht',
475475
mpc='gs://seqr-reference-data/GRCh37/MPC/fordist_constraint_official_mpc_values.ht',
476476
primate_ai='gs://seqr-reference-data/GRCh37/primate_ai/PrimateAI_scores_v0.2.ht',
477477
splice_ai='gs://seqr-reference-data/GRCh37/spliceai/spliceai_scores.ht',
@@ -486,8 +486,8 @@ def test_multiple_update_vat(
486486
dbnsfp='2.9.3',
487487
eigen=None,
488488
exac=None,
489-
gnomad_exomes='r2.1.1',
490-
gnomad_genomes='r2.1.1',
489+
gnomad_exomes='v4.1',
490+
gnomad_genomes='v4.1',
491491
mpc=None,
492492
primate_ai='v0.2',
493493
splice_ai=None,
@@ -667,8 +667,8 @@ def test_update_vat_without_accessing_private_datasets(
667667
dbnsfp='2.9.3',
668668
eigen=None,
669669
exac=None,
670-
gnomad_exomes='r2.1.1',
671-
gnomad_genomes='r2.1.1',
670+
gnomad_exomes='v4.1',
671+
gnomad_genomes='v4.1',
672672
mpc=None,
673673
primate_ai='v0.2',
674674
splice_ai=None,
Binary file not shown.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
This folder comprises a Hail (www.hail.is) native Table or MatrixTable.
22
Written with version 0.2.128-eead8100a1c1
3-
Created at 2024/03/21 11:33:48
3+
Created at 2024/04/30 18:16:28
Binary file not shown.
Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)