-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Hi,
I know that Mr. Woodcroft is enjoying his holiday, and this may look annoying because I am going to share some findings for singleM.
Since the last several runs, I found that for each dataset I ran, there are some unexpected taxa found in those marine metagenomes: Glaucophytes, commonly present in freshwater and Paulinella, the rarely seen Amaebozan found in several environments. Red algae is also a rarely found taxa from Tara Oceans dataset.
One potential reason I reckon is that some hits from Cyanobacteria got assigned to these taxa by Diamond. So I decided to add cyanobacterial markers to the metapackage and sdb so that they can capture this. While then I found that singlem seems to reject annotating those off-target taxa too precise, by presenting errors in coverage calculation when I annotate each cyanobacteria marker with their 7-rank taxonomy.
Then, I reduced the taxonomy to d__Bacteria;Cyanobacteriota;;;;;. But then singlem seemed not happy with it as well. So I wonder, when annotating sdb and writing taxonomy table while creating singlem packages, what was your approach? Did you frame taxonomy like this: d__Bacteria;p__Cyanobacteriota;c__;o__;f__;g__;s__?
Cheers
Andy