-
Notifications
You must be signed in to change notification settings - Fork 129
Description
Is there the possibility of including a raw data correction algorithm on par with that implemented in DADA2? This algorithm would take the raw R1 and R2 FASTQ files (for computational efficiency could be deduplicated already), and use the error model derived from the raw reads themselves to correct the raw reads and then returns the corrected FASTQ file. For downstream analyses, any clustering step (e.g. unoise in vsearch or swarm) would then cluster biological variation rather than both "noise" caused by PCR and sequencing errors, and biological variation.
I realize this might be a relatively big task, but I believe it would be a very useful feature. Once the raw data have been corrected, then all unique sequences would represent "true" biological variation observed in the sample, and would be appropriate for downstream analyses without the need for additional clustering.
Thank you,
Tomas