Skip to content

Clarification - how to avoid high FPR #377

@exeter-matthew-wakeling

Description

I'm trying to run Kevlar on a human trio, sequenced on a BGI-Seq 500, so the data has a fairly low error rate, but the reads are not error-corrected. For some inexplicable reason we have been given about 50X-worth of data when we expected 30X. I am getting some errors for too-high FPR. I ran:

kevlar count --memory 60000M b37_mask.ct Escherichia_coli_str_k_12_substr_mg1655.ASM584v2.dna.toplevel.fa.gz UniVec.fasta human_g1k_v37.fasta
kevlar count --memory 60000M proband.ct proband_R1.fastq.gz proband_R2.fastq.gz
kevlar count --memory 60000M father.ct father_R1.fastq.gz father_R2.fastq.gz
kevlar count --memory 60000M mother.ct mother_R1.fastq.gz mother_R2.fastq.gz
kevlar novel --case proband_R1.fastq.gz proband_R2.fastq.gz --case-counts proband.ct --control-counts father.ct mother.ct -o proband_novel.augfastq
kevlar filter --mask b37_mask.ct -o proband_filtered.augfastq proband_novel.augfastq

The kevlar filter stage fails with a high FPR. The documentation states that a FPR of <0.5 for control samples and <0.05 for case samples are required, and that this can be achieved for uncorrected reads with 36-72GB of space with a suitable mask. I have tried the above commands for three trios so far, and the FPR rates are as follows:

  1. proband: 0.004, father: 0.053, mother: 0.005, filter: 0.123
  2. proband: 0.046, father: 0.007, mother: 0.047, filter: 0.971
  3. proband: 0.006, father: 0.019, mother: 0.005, filter: 0.629

So, the FPR rates for the samples are under the thresholds stated in the documentation, but the FPR rate is reported much higher by the filter command, and the filter command produces an error message.

I have applied the --mask option to the filter command. I just noticed that the count command also has a --mask option - should I be giving the same mask to the count command as well? The documentation doesn't really make this clear. Should I allocate even more memory to the count process?

I would greatly appreciate being set in the right direction on this one, but I think the documentation could also do with clarification. Many thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions