-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
After running successfully through the example dataset I've ran kevlar
on my own data but am getting an error at the count_control
step, that the FPR is too high.
I'm trying to figure out
- why?
- what does this mean?
- how can I solve this issue?
Error log
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Job counts:
count jobs
16 assemble
16 call
1 calls
1 count_case
2 count_control
1 count_reference
1 create_mask
1 filter_novel
1 like_scores
1 link_input_seqs
1 link_mask
1 link_reference
1 localize
1 novel
1 partition
1 split
47
/home/moldach/miniconda3/lib/python3.7/site-packages/pulp/pulp.py:1195: UserWarning: Spaces are not permitted in the name. Converted to '_'
warnings.warn("Spaces are not permitted in the name. Converted to '_'")
[Thu Oct 22 10:46:50 2020]
Job 24: Create internal links for sample sequence data.
[Thu Oct 22 10:46:50 2020]
Job 42: Create internal links for mask sequence data.
[Thu Oct 22 10:46:50 2020]
Job 22: Create internal links for reference genome, and index if needed.
�[33mJob counts:
count jobs
1 link_input_seqs
1�[0m
�[33mJob counts:
count jobs
1 link_mask
1�[0m
�[33mJob counts:
count jobs
1 link_reference
1�[0m
/home/moldach/miniconda3/lib/python3.7/site-packages/pulp/pulp.py:1195: UserWarning: Spaces are not permitted in the name. Converted to '_'
warnings.warn("Spaces are not permitted in the name. Converted to '_'")
/home/moldach/miniconda3/lib/python3.7/site-packages/pulp/pulp.py:1195: UserWarning: Spaces are not permitted in the name. Converted to '_'
warnings.warn("Spaces are not permitted in the name. Converted to '_'")
/home/moldach/miniconda3/lib/python3.7/site-packages/pulp/pulp.py:1195: UserWarning: Spaces are not permitted in the name. Converted to '_'
warnings.warn("Spaces are not permitted in the name. Converted to '_'")
[Thu Oct 22 10:46:51 2020]
Finished job 42.
1 of 47 steps (2%) done
[Thu Oct 22 10:46:51 2020]
Finished job 24.
2 of 47 steps (4%) done
[Thu Oct 22 10:46:51 2020]
Finished job 22.
3 of 47 steps (6%) done
[Thu Oct 22 10:46:51 2020]
Job 2: Count k-mers in the reference genome.
kevlar --tee --logfile Logs/refrcount.log count --ksize 31 --counter-size 4 --memory 12G --max-fpr 0.025 --threads 8 Reference/refr-counts.smallcounttable Reference/Homo_sapiens.GRCh37.dna.toplevel.fa
[kevlar] running version 0.7+15.gebabd62
[kevlar::count] Storing k-mers in a small count table, a CountMin sketch with a counter size of 4 bits, for k-mer abundance queries (max abundance 15)
[kevlar::count] - processing "Reference/Homo_sapiens.GRCh37.dna.toplevel.fa"
[kevlar::count] Done loading k-mers;
297 reads processed, 2486108683 distinct k-mers stored;
estimated false positive rate is 0.013;
saved to "Reference/refr-counts.smallcounttable"
[kevlar::count] Total time: 17203.38 seconds
[Thu Oct 22 15:33:50 2020]
Finished job 2.
4 of 47 steps (9%) done
[Thu Oct 22 15:33:50 2020]
Job 23: Generate a mask of sequences to ignore while k-mer counting.
kevlar --tee --logfile Logs/mask.log count --ksize 31 --counter-size 1 --memory 6G --max-fpr 0.005 --threads 8 Mask/mask.nodetable Mask/Homo_sapiens.GRCh37.dna.toplevel.fa
[kevlar] running version 0.7+15.gebabd62
[kevlar::count] Storing k-mers in a node table (Bloom filter) for k-mer presence/absence queries
[kevlar::count] - processing "Mask/Homo_sapiens.GRCh37.dna.toplevel.fa"
[kevlar::count] Done loading k-mers;
297 reads processed, 2493095857 distinct k-mers stored;
estimated false positive rate is 0.001;
saved to "Mask/mask.nodetable"
[kevlar::count] Total time: 5911.40 seconds
[Thu Oct 22 17:12:24 2020]
Finished job 23.
5 of 47 steps (11%) done
[Thu Oct 22 17:12:24 2020]
Job 5: Count k-mers in a control sample
�[33mJob counts:
count jobs
1 count_control
1�[0m
/home/moldach/miniconda3/lib/python3.7/site-packages/pulp/pulp.py:1195: UserWarning: Spaces are not permitted in the name. Converted to '_'
warnings.warn("Spaces are not permitted in the name. Converted to '_'")
�[33mkevlar --tee --logfile Logs/ctrl1count.log count --ksize 31 --memory 16G --max-fpr 0.05 --mask Mask/mask.nodetable --threads 8 Sketches/ctrl1-counts.counttable Reads/ctrl1.inseq.0.fastq.gz Reads/ctrl1.inseq.1.fastq.gz�[0m
[kevlar] running version 0.7+15.gebabd62
[kevlar::count] Storing k-mers in a count table, a CountMin sketch with a counter size of 8 bits, for k-mer abundance queries (max abundance 255)
[kevlar::count] - processing "Reads/ctrl1.inseq.0.fastq.gz"
[kevlar::count] - processing "Reads/ctrl1.inseq.1.fastq.gz"
[kevlar::count] Done loading k-mers;
851849754 reads processed, 5429363124 distinct k-mers stored;
estimated false positive rate is 0.385 (FPR too high, bailing out!!!)
�[32m[Thu Oct 22 19:34:15 2020]�[0m
�[31mError in rule count_control:�[0m
�[31m jobid: 0�[0m
�[31m output: Sketches/ctrl1-counts.counttable, Logs/ctrl1count.log�[0m
�[31m�[0m
�[31mRuleException:
CalledProcessError in line 243 of /gpfs/home/moldach/projects/CG00018/Snakefile:
Command 'set -euo pipefail; kevlar --tee --logfile Logs/ctrl1count.log count --ksize 31 --memory 16G --max-fpr 0.05 --mask Mask/mask.nodetable --threads 8 Sketches/ctrl1-counts.counttable Reads/ctrl1.inseq.0.fastq.gz Reads/ctrl1.inseq.1.fastq.gz' returned non-zero exit status 1.
File "/home/moldach/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2189, in run_wrapper
File "/gpfs/home/moldach/projects/CG00018/Snakefile", line 243, in __rule_count_control
File "/home/moldach/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 529, in _callback
File "/home/moldach/miniconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run
File "/home/moldach/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 515, in cached_or_run
File "/home/moldach/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2201, in run_wrapper�[0m
�[31mExiting because a job execution failed. Look above for error message�[0m
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/home/moldach/projects/CG00018/.snakemake/log/2020-10-22T104648.700597.snakemake.log
config.json
{
"ksize": 31,
"recountmem": "1G",
"numsplit": 16,
"samples": {
"casemin": 6,
"ctrlmax": 1,
"case": {
"fastx": [
"/tiered/mtgraovac/CombGene/CG00018/FASTQ/CG00018P_R1.fastq.gz",
"/tiered/mtgraovac/CombGene/CG00018/FASTQ/CG00018P_R2.fastq.gz"
],
"memory": "16G",
"label": "Proband",
"max_fpr": 0.3
},
"controls": [
{
"fastx": [
"/tiered/mtgraovac/CombGene/CG00018/FASTQ/CG00018M_R1.fastq.gz",
"/tiered/mtgraovac/CombGene/CG00018/FASTQ/CG00018M_R2.fastq.gz"
],
"memory": "16G",
"label": "Mother",
"max_fpr": 0.05
},
{
"fastx": [
"/tiered/mtgraovac/CombGene/CG00018/FASTQ/CG00018F_R1.fastq.gz",
"/tiered/mtgraovac/CombGene/CG00018/FASTQ/CG00018F_R2.fastq.gz"
],
"memory": "16G",
"label": "Father",
"max_fpr": 0.05
}
],
"coverage": {
"mean": 30.0,
"stdev": 10.0
}
},
"mask": {
"fastx": [
"/tiered/mtgraovac/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa"
],
"memory": "6G",
"max_fpr": 0.005
},
"reference": {
"fasta": "/tiered/mtgraovac/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa",
"memory": "12G",
"max_fpr": 0.025
},
"localize": {
"seedsize": 51,
"delta": 50,
"seqpattern": ".",
"maxdiff": 10000
},
"varfilter": null
}
Submission
#!/bin/bash
#BSUB -q normal
#BSUB -J kevlar
#BSUB -R "rusage[mem=16G]"
#BSUB -n 8
#BSUB -M 16000
#BSUB -W 600:00
#BSUB -u moldach@ucalgary.ca
#BSUB -R "select[hname!=node013]"
#BSUB -B
#BSUB -N
#BSUB -o kevlar_CG00018.out
#BSUB -e kevlar_CG00018.err
source ~/kavlar-test/kevlar-env/bin/activate
snakemake --snakefile Snakefile --configfile config.json --cores 8 --directory ./ -p calls
Metadata
Metadata
Assignees
Labels
No labels