-
Notifications
You must be signed in to change notification settings - Fork 605
Half-bad hetvars: one bad allele fails the genotype [VS-1615] [VS-1649] #9149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 20 commits
Commits
Show all changes
57 commits
Select commit
Hold shift + click to select a range
f75c299
Half-bad hetvars, VCF edition: one bad allele fails the genotype [VS-…
mcovarr 1d30881
dockstore
mcovarr e21f0ac
VQSR version
mcovarr 3ee277a
VDS attempt
mcovarr 756d363
Merge remote-tracking branch 'origin/ah_var_store' into vs_1615_hetva…
mcovarr 4e18503
dockstore
mcovarr 0171302
make VDS from half-bad failed hetvar VCFs
mcovarr d5b88b6
update merge and rescore too
mcovarr d428e13
turn on filtering for VQSR manual tieout
mcovarr 182019c
Merge remote-tracking branch 'origin/ah_var_store' into vs_1615_hetva…
mcovarr 815c1a3
variants docker
mcovarr bda44b7
dockstore
mcovarr 45b65b2
holy moly tabs
mcovarr ed09ccc
docker
mcovarr ace768f
dockstore
mcovarr 852e96f
dockstore
mcovarr 7b47b30
remove stuff prior to review
mcovarr 7c9e753
Merge remote-tracking branch 'origin/ah_var_store' into vs_1615_hetva…
mcovarr eaf3b08
docker
mcovarr f919d24
include Hail optimizations, update Docker
mcovarr 1a21244
simplify more
mcovarr 23d45a6
Docker
mcovarr f444f30
oops
mcovarr 84eaa02
docker
mcovarr a040dc3
hash bump
mcovarr 124d917
another hash bump
mcovarr c533916
yet another hash bump
mcovarr eb82f9f
update truth path
mcovarr 2c00049
hash bump
mcovarr a545738
another hash bump
mcovarr fdb1988
Merge remote-tracking branch 'origin/ah_var_store' into vs_1615_hetva…
mcovarr 2311f27
how did that get messed up
mcovarr 7dc7c85
spanning deletion . quality scores become NaN doubles
mcovarr ab94c99
fresh baked GATK Docker with . quality fixes
mcovarr b3bb025
hash bump for all chr run
mcovarr 218dae4
hash bump
mcovarr 41f3844
hash bump
mcovarr 27ac2bb
hash bump
mcovarr 8e52897
hash bump
mcovarr 0f21f6f
vat hash bump
mcovarr 6f831cb
vat hash bump
mcovarr 64a5bcc
vat hash bump
mcovarr 3a9a0ae
vat hash bump
mcovarr 6f606f0
hash bump
mcovarr 347eb2f
hash bump
mcovarr 6d92d22
hash bump
mcovarr 521a606
Merge remote-tracking branch 'origin/ah_var_store' into vs_1615_hetva…
mcovarr 5fbc434
cleanup
mcovarr a8e5796
hash bump
mcovarr 33a67a8
hash bump
mcovarr 2a8a8b5
hash bump
mcovarr e2bc338
hash bump
mcovarr 09a0d5f
cleanup, VAT from VDS
mcovarr 6c6fa2b
hash bump
mcovarr 9ade1a4
hash bump
mcovarr c09d1a3
hash bump
mcovarr 7aa2ce1
hash bump
mcovarr File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,34 +17,34 @@ | |
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is all stowaway changes fixing some bonkers whitespace that was driving IntelliJ nuts |
||
|
||
def check_samples_match(vds): | ||
print('checking sample equivalence between reference and variant MTs') | ||
assert vds.reference_data.cols().select().collect() == vds.variant_data.cols().select().collect() | ||
print('checking sample equivalence between reference and variant MTs') | ||
assert vds.reference_data.cols().select().collect() == vds.variant_data.cols().select().collect() | ||
|
||
def check_ref_blocks(vds): | ||
print('checking that:\n * no reference blocks have GQ=0\n * all ref blocks have END after start\n * all ref blocks are max 1000 bases long') | ||
rd = vds.reference_data | ||
rd = rd.annotate_rows(locus_start = rd.locus.position) | ||
print('checking that:\n * no reference blocks have GQ=0\n * all ref blocks have END after start\n * all ref blocks are max 1000 bases long') | ||
rd = vds.reference_data | ||
rd = rd.annotate_rows(locus_start = rd.locus.position) | ||
|
||
LEN = rd.END - rd.locus_start + 1 | ||
LEN = rd.END - rd.locus_start + 1 | ||
|
||
print('checking that: no reference blocks have GQ=0') | ||
assert rd.aggregate_entries(hl.agg.all(hl.all(rd.GQ > 0))) | ||
print('checking that: no reference blocks have GQ=0') | ||
assert rd.aggregate_entries(hl.agg.all(hl.all(rd.GQ > 0))) | ||
|
||
print('checking that: all ref blocks have END after start') | ||
assert rd.aggregate_entries(hl.agg.all(hl.all(LEN >= 0))) | ||
print('checking that: all ref blocks have END after start') | ||
assert rd.aggregate_entries(hl.agg.all(hl.all(LEN >= 0))) | ||
|
||
print('checking that: all ref blocks are max 1000 bases long') | ||
assert rd.aggregate_entries(hl.agg.all(hl.all(LEN <= rd.ref_block_max_length))) | ||
print('checking that: all ref blocks are max 1000 bases long') | ||
assert rd.aggregate_entries(hl.agg.all(hl.all(LEN <= rd.ref_block_max_length))) | ||
|
||
def check_densify_small_region(vds): | ||
print('running densify on 200kb region') | ||
from time import time | ||
t1 = time() | ||
print('running densify on 200kb region') | ||
from time import time | ||
t1 = time() | ||
|
||
filt = hl.vds.filter_intervals(vds, [hl.parse_locus_interval('chr16:29.5M-29.7M', reference_genome='GRCh38')]) | ||
n=hl.vds.to_dense_mt(filt).select_entries('LGT')._force_count_rows() | ||
filt = hl.vds.filter_intervals(vds, [hl.parse_locus_interval('chr16:29.5M-29.7M', reference_genome='GRCh38')]) | ||
n=hl.vds.to_dense_mt(filt).select_entries('LGT')._force_count_rows() | ||
|
||
print(f'took {time() - t1:.1f}s to densify {n} rows after interval query') | ||
print(f'took {time() - t1:.1f}s to densify {n} rows after interval query') | ||
|
||
|
||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused as to why this is not:
Maybe I'm just confounded by the logic
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking the SNP line as an example, I interpreted this as "nothing wrong SNP-wise if it's not a SNP at all, OR (implying it is a SNP) if the allele is OK".
If the allele was bad and it was an INDEL, the following line would catch it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually now that I look at this again I think I can/should clean it up a bit... I don't think we care about "any" SNPs or INDELs any more now that the default for the "ok" accumulators is
True
.