Skip to content

count_control error: estimated false positive rate is 0.385 (FPR too high, bailing out!!! #385

@moldach

Description

@moldach

After running successfully through the example dataset I've ran kevlar on my own data but am getting an error at the count_control step, that the FPR is too high.

I'm trying to figure out

  • why?
  • what does this mean?
  • how can I solve this issue?

Error log

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Job counts:
	count	jobs
	16	assemble
	16	call
	1	calls
	1	count_case
	2	count_control
	1	count_reference
	1	create_mask
	1	filter_novel
	1	like_scores
	1	link_input_seqs
	1	link_mask
	1	link_reference
	1	localize
	1	novel
	1	partition
	1	split
	47
/home/moldach/miniconda3/lib/python3.7/site-packages/pulp/pulp.py:1195: UserWarning: Spaces are not permitted in the name. Converted to '_'
  warnings.warn("Spaces are not permitted in the name. Converted to '_'")

[Thu Oct 22 10:46:50 2020]
Job 24: Create internal links for sample sequence data.

[Thu Oct 22 10:46:50 2020]
Job 42: Create internal links for mask sequence data.

[Thu Oct 22 10:46:50 2020]
Job 22: Create internal links for reference genome, and index if needed.

�[33mJob counts:
	count	jobs
	1	link_input_seqs
	1�[0m
�[33mJob counts:
	count	jobs
	1	link_mask
	1�[0m
�[33mJob counts:
	count	jobs
	1	link_reference
	1�[0m
/home/moldach/miniconda3/lib/python3.7/site-packages/pulp/pulp.py:1195: UserWarning: Spaces are not permitted in the name. Converted to '_'
  warnings.warn("Spaces are not permitted in the name. Converted to '_'")
/home/moldach/miniconda3/lib/python3.7/site-packages/pulp/pulp.py:1195: UserWarning: Spaces are not permitted in the name. Converted to '_'
  warnings.warn("Spaces are not permitted in the name. Converted to '_'")
/home/moldach/miniconda3/lib/python3.7/site-packages/pulp/pulp.py:1195: UserWarning: Spaces are not permitted in the name. Converted to '_'
  warnings.warn("Spaces are not permitted in the name. Converted to '_'")
[Thu Oct 22 10:46:51 2020]
Finished job 42.
1 of 47 steps (2%) done
[Thu Oct 22 10:46:51 2020]
Finished job 24.
2 of 47 steps (4%) done
[Thu Oct 22 10:46:51 2020]
Finished job 22.
3 of 47 steps (6%) done

[Thu Oct 22 10:46:51 2020]
Job 2: Count k-mers in the reference genome.

kevlar --tee --logfile Logs/refrcount.log count --ksize 31 --counter-size 4 --memory 12G --max-fpr 0.025 --threads 8 Reference/refr-counts.smallcounttable Reference/Homo_sapiens.GRCh37.dna.toplevel.fa
[kevlar] running version 0.7+15.gebabd62
[kevlar::count] Storing k-mers in a small count table, a CountMin sketch with a counter size of 4 bits, for k-mer abundance queries (max abundance 15)
[kevlar::count] - processing "Reference/Homo_sapiens.GRCh37.dna.toplevel.fa"
[kevlar::count] Done loading k-mers;
    297 reads processed, 2486108683 distinct k-mers stored;
    estimated false positive rate is 0.013;
    saved to "Reference/refr-counts.smallcounttable"
[kevlar::count] Total time: 17203.38 seconds
[Thu Oct 22 15:33:50 2020]
Finished job 2.
4 of 47 steps (9%) done

[Thu Oct 22 15:33:50 2020]
Job 23: Generate a mask of sequences to ignore while k-mer counting.

kevlar --tee --logfile Logs/mask.log count --ksize 31 --counter-size 1 --memory 6G --max-fpr 0.005 --threads 8 Mask/mask.nodetable Mask/Homo_sapiens.GRCh37.dna.toplevel.fa
[kevlar] running version 0.7+15.gebabd62
[kevlar::count] Storing k-mers in a node table (Bloom filter) for k-mer presence/absence queries
[kevlar::count] - processing "Mask/Homo_sapiens.GRCh37.dna.toplevel.fa"
[kevlar::count] Done loading k-mers;
    297 reads processed, 2493095857 distinct k-mers stored;
    estimated false positive rate is 0.001;
    saved to "Mask/mask.nodetable"
[kevlar::count] Total time: 5911.40 seconds
[Thu Oct 22 17:12:24 2020]
Finished job 23.
5 of 47 steps (11%) done

[Thu Oct 22 17:12:24 2020]
Job 5: Count k-mers in a control sample

�[33mJob counts:
	count	jobs
	1	count_control
	1�[0m
/home/moldach/miniconda3/lib/python3.7/site-packages/pulp/pulp.py:1195: UserWarning: Spaces are not permitted in the name. Converted to '_'
  warnings.warn("Spaces are not permitted in the name. Converted to '_'")
�[33mkevlar --tee --logfile Logs/ctrl1count.log count --ksize 31 --memory 16G --max-fpr 0.05 --mask Mask/mask.nodetable --threads 8 Sketches/ctrl1-counts.counttable Reads/ctrl1.inseq.0.fastq.gz Reads/ctrl1.inseq.1.fastq.gz�[0m
[kevlar] running version 0.7+15.gebabd62
[kevlar::count] Storing k-mers in a count table, a CountMin sketch with a counter size of 8 bits, for k-mer abundance queries (max abundance 255)
[kevlar::count] - processing "Reads/ctrl1.inseq.0.fastq.gz"
[kevlar::count] - processing "Reads/ctrl1.inseq.1.fastq.gz"
[kevlar::count] Done loading k-mers;
    851849754 reads processed, 5429363124 distinct k-mers stored;
    estimated false positive rate is 0.385 (FPR too high, bailing out!!!)
�[32m[Thu Oct 22 19:34:15 2020]�[0m
�[31mError in rule count_control:�[0m
�[31m    jobid: 0�[0m
�[31m    output: Sketches/ctrl1-counts.counttable, Logs/ctrl1count.log�[0m
�[31m�[0m
�[31mRuleException:
CalledProcessError in line 243 of /gpfs/home/moldach/projects/CG00018/Snakefile:
Command 'set -euo pipefail;  kevlar --tee --logfile Logs/ctrl1count.log count --ksize 31 --memory 16G --max-fpr 0.05 --mask Mask/mask.nodetable --threads 8 Sketches/ctrl1-counts.counttable Reads/ctrl1.inseq.0.fastq.gz Reads/ctrl1.inseq.1.fastq.gz' returned non-zero exit status 1.
  File "/home/moldach/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2189, in run_wrapper
  File "/gpfs/home/moldach/projects/CG00018/Snakefile", line 243, in __rule_count_control
  File "/home/moldach/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 529, in _callback
  File "/home/moldach/miniconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run
  File "/home/moldach/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 515, in cached_or_run
  File "/home/moldach/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2201, in run_wrapper�[0m
�[31mExiting because a job execution failed. Look above for error message�[0m
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/home/moldach/projects/CG00018/.snakemake/log/2020-10-22T104648.700597.snakemake.log

config.json

{
    "ksize": 31,
    "recountmem": "1G",
    "numsplit": 16,
    "samples": {
        "casemin": 6,
        "ctrlmax": 1,
        "case": {
            "fastx": [
                "/tiered/mtgraovac/CombGene/CG00018/FASTQ/CG00018P_R1.fastq.gz",
                "/tiered/mtgraovac/CombGene/CG00018/FASTQ/CG00018P_R2.fastq.gz"
            ],
            "memory": "16G",
            "label": "Proband",
            "max_fpr": 0.3
        },
	"controls": [
            {
             	"fastx": [
                    "/tiered/mtgraovac/CombGene/CG00018/FASTQ/CG00018M_R1.fastq.gz",
                    "/tiered/mtgraovac/CombGene/CG00018/FASTQ/CG00018M_R2.fastq.gz"
                ],
                "memory": "16G",
                "label": "Mother",
                "max_fpr": 0.05
            },
            {
             	"fastx": [
                    "/tiered/mtgraovac/CombGene/CG00018/FASTQ/CG00018F_R1.fastq.gz",
                    "/tiered/mtgraovac/CombGene/CG00018/FASTQ/CG00018F_R2.fastq.gz"
                ],
                "memory": "16G",
                "label": "Father",
                "max_fpr": 0.05
            }
	],
	"coverage": {
            "mean": 30.0,
            "stdev": 10.0
        }
    },
    "mask": {
	"fastx": [
            "/tiered/mtgraovac/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa"
        ],
	"memory": "6G",
        "max_fpr": 0.005
    },
    "reference": {
        "fasta": "/tiered/mtgraovac/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa",
        "memory": "12G",
        "max_fpr": 0.025
    },
    "localize": {
        "seedsize": 51,
        "delta": 50,
        "seqpattern": ".",
        "maxdiff": 10000
    },
    "varfilter": null
}

Submission

#!/bin/bash
#BSUB -q normal
#BSUB -J kevlar
#BSUB -R "rusage[mem=16G]"
#BSUB -n 8
#BSUB -M 16000
#BSUB -W 600:00
#BSUB -u moldach@ucalgary.ca
#BSUB -R "select[hname!=node013]"
#BSUB -B
#BSUB -N
#BSUB -o kevlar_CG00018.out
#BSUB -e kevlar_CG00018.err

source ~/kavlar-test/kevlar-env/bin/activate
snakemake --snakefile Snakefile --configfile config.json --cores 8 --directory ./ -p calls

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions