Skip to content

Test cases

Thomas Cokelaer edited this page Jul 31, 2020 · 20 revisions

Virome B2888

coming soon. This is a DNA case where ACGT fluctuates a lot. Lot of adapters content as well.

RNA-seq N's present in large proportions (B3353)

Once a fastqc (and multiqc) is available, we usually look at the quality plot. Those tools provide a green/orange/red light indicating no warning/warning/error status. In this RNA-seq experiment with 6 samples, we got a per base sequence quality plot showing a drop of quality from position 0 to 40, which is pronounced in one of the sample. We have the feeling that one sample is totally wrong since the quality is below 20 at the beginning of all reads.

A complementary plot is the per base N content, which is shown here below:

Here we see the same samples. The red curve correspond to the same sample that was red in the previous plot. This sample has actually 40% of Ns at the beginning and is therefore tagged with a red color (error) indicating that this sample should be dropped.

In fact, what is going on here is that the quality of the library was such that lots of dimers of adapters were created. 40% of the reads actually contains no data. Sequencers created reads with just N's and no genomic content. Yet, the other 60% of reads were totally correct and with high quality. Moreover, the reads made o Ns have a length of 35 bp. Coming back to the first plot, if we ignore the reads with Ns (that have poor quality), the rest of the data has a expected high quality.

Subsequent RNA-seq analysis, which ignore the reads with Ns, showed no different between this sample and the other 5 samples.

Conclusion: even tough the plots indicated a very poor quality for one sample, ignoring the Ns and assuming the yield of reads is enough for the bioinformatics analysis, the reads were usable and the experiment validated.

Clone this wiki locally