Skip to content

Dev #1050

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Mar 6, 2025
Merged

Dev #1050

Changes from 14 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
f164733
gnomad v4 sv migration
bpblanken Feb 19, 2025
e8d9616
ruff
bpblanken Feb 19, 2025
abeb65e
Update 0004_add_gnomad_svs.py
bpblanken Feb 20, 2025
e88c303
Update 0004_add_gnomad_svs.py
bpblanken Feb 20, 2025
3e3658e
Update 0004_add_gnomad_svs.py
bpblanken Feb 20, 2025
282bad2
Update 0004_add_gnomad_svs.py
bpblanken Feb 20, 2025
cf5875c
Update 0004_add_gnomad_svs.py
bpblanken Feb 20, 2025
026631f
comment
bpblanken Feb 20, 2025
94c30fa
Merge branch 'benb/sv_gnomad_v4_migration' of github.com:broadinstitu…
bpblanken Feb 20, 2025
4b3f678
ruff
bpblanken Feb 20, 2025
5c0cbef
Merge branch 'dev' of github.com:broadinstitute/seqr-loading-pipeline…
bpblanken Feb 23, 2025
4278cc3
Merge branch 'dev' of github.com:broadinstitute/seqr-loading-pipeline…
bpblanken Feb 23, 2025
c80c04b
Merge pull request #1048 from broadinstitute/benb/sv_gnomad_v4_migration
jklugherz Feb 26, 2025
21d0527
Update 0004_add_gnomad_svs.py
jklugherz Mar 3, 2025
bfde429
Update 0004_add_gnomad_svs.py
jklugherz Mar 4, 2025
b87bd2e
do alleles field validation only if it exists on ht
jklugherz Mar 4, 2025
19b81ca
handle set of dataset types during allele type validation
bpblanken Mar 5, 2025
a5330e5
this is a cleaner approach
bpblanken Mar 5, 2025
e4f2b80
format
bpblanken Mar 5, 2025
0addb5b
Merge remote-tracking branch 'origin/benb/remove_hardcoded_datasettyp…
jklugherz Mar 5, 2025
fb0f6fe
Update reference_dataset.py
bpblanken Mar 5, 2025
5412715
run validation on sv get_ht
jklugherz Mar 5, 2025
49e90e3
Merge pull request #1053 from broadinstitute/benb/remove_hardcoded_da…
jklugherz Mar 5, 2025
416937d
Merge remote-tracking branch 'origin/dev' into sv-locus-alleles
jklugherz Mar 5, 2025
2511423
fix gnomad_svs ref data mock table
jklugherz Mar 5, 2025
aa0027b
Merge pull request #1052 from broadinstitute/sv-locus-alleles
jklugherz Mar 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions v03_pipeline/migrations/annotations/0004_add_gnomad_svs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
import hail as hl

from v03_pipeline.lib.annotations import sv
from v03_pipeline.lib.migration.base_migration import BaseMigration
from v03_pipeline.lib.model import DatasetType, ReferenceGenome
from v03_pipeline.lib.reference_datasets.reference_dataset import ReferenceDataset

# This vcf was generated with the gatk command:
#
# gatk SVConcordance --verbosity DEBUG --evaluation /var/seqr/phase4.seqr.gnomad_v4_tmp.vcf.gz
# --truth /var/seqr/gnomad.v4.1.sv.sites.modified.vcf.bgz
# --sequence-dictionary gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dict
#
# Followed by:
# bcftools annotate --rename-annots /var/seqr/remap /var/seqr/phase4.seqr.gnomad_v4_tmp.vcf.gz | bgzip > /var/seqr/phase4.seqr.gnomad_v4.vcf.gz
#
# where remap contains "INFO/TRUTH_VID GNOMAD_V4.1_TRUTH_VID"
PHASE_4_CALLSET_WITH_GNOMAD_V4 = 'gs://seqr-loading-temp/phase4.seqr.gnomad_v4.vcf.gz'


class AddGnomadSVs(BaseMigration):
reference_genome_dataset_types: frozenset[
tuple[ReferenceGenome, DatasetType]
] = frozenset(
((ReferenceGenome.GRCh38, DatasetType.SV),),
)

@staticmethod
def migrate(ht: hl.Table, **_) -> hl.Table:
mapping_ht = (
hl.import_vcf(
PHASE_4_CALLSET_WITH_GNOMAD_V4,
reference_genome=ReferenceGenome.GRCh38.value,
force_bgz=True,
)
.key_rows_by('rsid')
.rows()
)
ht = ht.annotate(
**{
'info.GNOMAD_V4.1_TRUTH_VID': mapping_ht[ht.key].info[
'GNOMAD_V4.1_TRUTH_VID'
],
},
)
gnomad_svs_ht = ReferenceDataset.gnomad_svs.get_ht(ReferenceGenome.GRCh38)
ht = ht.annotate(gnomad_svs=sv.gnomad_svs(ht, gnomad_svs_ht))
return ht.drop('info.GNOMAD_V4.1_TRUTH_VID')