-
Notifications
You must be signed in to change notification settings - Fork 9
Description
I'm trying to run Kevlar on a human trio, sequenced on a BGI-Seq 500, so the data has a fairly low error rate, but the reads are not error-corrected. For some inexplicable reason we have been given about 50X-worth of data when we expected 30X. I am getting some errors for too-high FPR. I ran:
kevlar count --memory 60000M b37_mask.ct Escherichia_coli_str_k_12_substr_mg1655.ASM584v2.dna.toplevel.fa.gz UniVec.fasta human_g1k_v37.fasta
kevlar count --memory 60000M proband.ct proband_R1.fastq.gz proband_R2.fastq.gz
kevlar count --memory 60000M father.ct father_R1.fastq.gz father_R2.fastq.gz
kevlar count --memory 60000M mother.ct mother_R1.fastq.gz mother_R2.fastq.gz
kevlar novel --case proband_R1.fastq.gz proband_R2.fastq.gz --case-counts proband.ct --control-counts father.ct mother.ct -o proband_novel.augfastq
kevlar filter --mask b37_mask.ct -o proband_filtered.augfastq proband_novel.augfastq
The kevlar filter stage fails with a high FPR. The documentation states that a FPR of <0.5 for control samples and <0.05 for case samples are required, and that this can be achieved for uncorrected reads with 36-72GB of space with a suitable mask. I have tried the above commands for three trios so far, and the FPR rates are as follows:
- proband: 0.004, father: 0.053, mother: 0.005, filter: 0.123
- proband: 0.046, father: 0.007, mother: 0.047, filter: 0.971
- proband: 0.006, father: 0.019, mother: 0.005, filter: 0.629
So, the FPR rates for the samples are under the thresholds stated in the documentation, but the FPR rate is reported much higher by the filter command, and the filter command produces an error message.
I have applied the --mask option to the filter command. I just noticed that the count command also has a --mask option - should I be giving the same mask to the count command as well? The documentation doesn't really make this clear. Should I allocate even more memory to the count process?
I would greatly appreciate being set in the right direction on this one, but I think the documentation could also do with clarification. Many thanks.