Skip to content

Commit 041cd32

Browse files
authored
change clinvar submitters table partitions (#749)
make them equal to the clinvar vcf min partitions to hopefully reduce flakiness of the join. this is mostly a guess yesterday's test [run](https://console.cloud.google.com/dataproc/jobs/UpdateVariantAnnotationsTableWithUpdatedReferenceDataset_20240328_9a449403/monitoring?region=us-central1&project=seqr-project) succeeded but after multiple attempts
1 parent 8f8b664 commit 041cd32

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

v03_pipeline/lib/reference_data/clinvar.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@
4141
CLINVAR_SUBMISSION_SUMMARY_URL = (
4242
'ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/submission_summary.txt.gz'
4343
)
44+
MIN_HT_PARTITIONS = 2000
4445
logger = get_logger(__name__)
4546

4647

@@ -127,7 +128,7 @@ def download_and_import_latest_clinvar_vcf(
127128
drop_samples=True,
128129
skip_invalid_loci=True,
129130
contig_recoding=reference_genome.contig_recoding(include_mt=True),
130-
min_partitions=2000,
131+
min_partitions=MIN_HT_PARTITIONS,
131132
force_bgz=True,
132133
)
133134
mt = mt.annotate_globals(version=_parse_clinvar_release_date(tmp_file.name))
@@ -192,5 +193,5 @@ def download_and_import_clinvar_submission_summary() -> hl.Table:
192193
'ReportedPhenotypeInfo': hl.tstr,
193194
},
194195
missing='-',
195-
min_partitions=3, # recommended 2-4 partitions per core
196+
min_partitions=MIN_HT_PARTITIONS,
196197
)

0 commit comments

Comments
 (0)