-
Notifications
You must be signed in to change notification settings - Fork 3
Test cases
Usually, one look at a plot and fastqc or multiqc provide some green/red lights to tell us wether the quality is good or not. Each plot is taken individually to perform this task.
The following plot shows the per base sequence quality plot showing a drop of quality from position 0 to 40, which is pronounced in one of the sample. We have the feeling that one sample is totally wrong since the quality is below 20 at the beginning of all reads.
A complementary plot is the per base N content, which is shown here below:
Here we see the sample samples. The red curve correspond to the same sample that was red in the previous plot. This sample has actually 40% of Ns at the beginning and is therefore tagged with a red color (error) indicating that this sample should be dropped.
In fact, what is going on here is that the quality of the library was such that lots of dimers of adapters were created. 40% of the reads actually contains no data. Sequencers created reads with just N's and no genomic content. Yet, the other 60% of reads were totally correct and with high quality. Moreover, the reads made o Ns have a length of 35 bp. Coming back to the first plot, if we ignore the reads with Ns (that have poor quality), the rest of the data has a expected high quality.
Subsequent RNA-seq analysis, which ignore the reads with Ns, showed no different between this sample and the other 5 samples.
Conclusion: even tough the plots indicated a very poor quality for one sample, ignoring the Ns and assuming the yield of reads is enough for the bioinformatics analysis, the reads were usable and the experiment validated.
This pipeline is part of the Sequana project. If you use sequana_demultiplex, please consider citing us. Visit the How to cite ? section. You may also visit the pipeline page and star us.