Replies: 2 comments
-
Oh, also, ZymoResearch couldn't make any conclusions regarding rRNA contamination because they indicated the C.virginica genome rRNA annotations are incomplete, so mapping data to rRNA for this project was unreliable. |
Beta Was this translation helpful? Give feedback.
-
I used the Zymo Quick DNA/RNA Microprep Plus Kit on frozen female gonad (mixed cell type) and frozen sperm. Some relevant lab notebook links (could also go to my lab notebook >> tags >> "labwork" >> it's the series "Virginica Gonad DNA Extractions") |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Before I dive into this, @yaaminiv could you please remind me how the female/male gonads were processed for RNA isolation for Zymo project zr4059?
Here's the background story:
We have some C.virginica RNAseq data from female and male gonad samples. I aligned this data to the NCBI C.virginica genome and noticed that the overall alignment rate (aggregate of all the samples) was low, around 65% (normally, it should be >80%). Additionally, alignment rates in male samples were drastically lower than the females.
I reviewed the sequencing quality data and things looked fine. I suspected that rRNA contamination could be a possible culprit (rRNA is often difficult to align due to low complexity which means reads get mapped to multiple locations and then get discarded because the mapping software can't definitively decide where those reads should actually get mapped). Additionally, after looking at the documentation provided by ZymoResearch (which performed the library prep/sequencing), I discovered that they used a rRNA depletion system instead of an mRNA enrichment method. Our experience with the former has generally showed them to be ineffective in marine molluscs.
I contacted ZymoResearch to see if they could provide me with data (specifically, a Bionalyzer/TapeStation electropherogram) confirming that the rRNA depletion process was successful. As it turns out, they do not perform this step as part of their workflow.
During the exchanges with ZymoResearch, I also discovered that the library prep kit they use has a recommendation for trimming after sequencing that requires removal of an additional 10bp from the 5' ends of R2 reads. Simple adapter removal is insufficient.
So, these two factors (rRNA contamination and trimming) led me to believe that these could explain the poor alignment rates. ZymoResearch were dubious that rRNA contamination would be present and/or would not drastically impact alignment rates. They offered to run some of the data through their pipeline to look at things and see what they could find. They've shared the following MulitQC report (note: it may take a minute to load in your browser):
https://gannet.fish.washington.edu/Atumefaciens/20220302-cvir-RNAseq-gonad-zymo_multiqc/zr4059_multiqc_report_with_alignment.html
The ZymoResearch explanation of their reports is here:
https://github.com/Zymo-Research/service-pipeline-documentation/blob/master/docs/how_to_use_RNAseq_report.md
The big takeaway here is that all of the male samples (samples names ending with an
M
) have the following issues:significant amounts of gDNA
possible contaminating sequence
So, with all of that in mind, does anyone have any thoughts/discussion on how gDNA contamination would impact:
Keeping in mind we have an annotated genome that was used for aligning RNAseq. Will differential expression analysis take this into account and only deal with reads falling into regions annotated as RNA/CDS/exon/etc and ignore reads falling into intronic/intergenic regions? Same question applies for genome-guided transcriptome assembly (I'll actually hit up the Trinity developer(s) to see their thoughts).
Or, do we have to filter the data ourselves to ensure that downstream analyses are only using reads aligning in RNA/CDS/exon/etc?
I'd like to assume that downstream analysis will utilize only data which aligns to the parts of the genome that one would expect to generate transcripts, but we know what happens when we assume - we break the Golden Rule of Bioinformatics!
On a side note, that MultiQC report is pretty boss! I always forget about all of the modules available! Also, it looks like they used an RNAseq Nextflow pipeline to handle all of that data processing (including some differential gene expression) - definitely pretty slick!
Beta Was this translation helpful? Give feedback.
All reactions