Replies: 11 comments
-
Judging from the assembly length, it does not seem you're having a bacterial isolate dataset. At least I'm unaware about bacteria with genome size of 40 Mbp. Anyway, will you please post your spades.log files from these runs? |
Beta Was this translation helpful? Give feedback.
-
@asl thank you for your reply. this is for a fungal genome, if I will post the spades.log files ASAP |
Beta Was this translation helpful? Give feedback.
-
spades.log for |
Beta Was this translation helpful? Give feedback.
-
So are we or are we not supposed to use Here are quotes from https://github.com/ablab/spades : "--isolate This flag is highly recommended for high-coverage isolate and multi-cell Illumina data; improves the assembly quality and running time. We also recommend to trim your reads prior to the assembly. More details can be found here. This option is not compatible with --only-error-correction or --careful options."
|
Beta Was this translation helpful? Give feedback.
-
I can confirm the results of xonq. Removing the |
Beta Was this translation helpful? Give feedback.
-
I've run busco on several assemblies of marine fishes with and without the --isolate setting. The assemblies without --isolate score better. |
Beta Was this translation helpful? Give feedback.
-
Judging from @cbird808 datasets – the reason is low and uneven coverage plus additional coverage filtering enabled which removes significant parts of the assembly. @xonq case is similar: reads of 140 bp, custom maximum k-mer length of 121 and coverage filtering. This could easily create issues during the assembly. The number of isolated reads that did not enter the assembly is enormous. |
Beta Was this translation helpful? Give feedback.
-
thank you @asl for following up. My understanding is that for the type of genomes I'm working with (euk, Ill pe 150, no genomic resources, non model species) I should be using neither the |
Beta Was this translation helpful? Give feedback.
-
It's not that the euk genome is the problem, but rather the properties of input data: low and uneven coverage, etc. You may want to look into the possible problems during the sequencing / library preparation |
Beta Was this translation helpful? Give feedback.
-
If low and uneven coverage is the problem, then shouldn't the thresholds for the error output below be adjusted?
|
Beta Was this translation helpful? Give feedback.
-
Well, the problem is that there is no reliable way to asses whether the coverage is even post-hoc. Even more, the decisions made during the assembly might effectively "hide" the issues (at the expense of assembly quality, of course). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am assembling fungal genomes from 150 bp PE Illumina short reads. I've noted that it is recommended to use
--isolate
for "high-coverage multi-cell/isolate data"; however, when specified and compared the assembly quality decreased based on standard measurements (N50, contig number, largest contig). Furthermore, I was unable to recover a known gene cluster on one contig using--isolate
, but it was recovered on a single contig when I reran without it.with
--isolate
(contigs > 1kb):without
--isolate
(contigs > 1kb):I therefore have evidence from a biological standpoint (the gene cluster recovery) and the assembly statistics (which I understand could be falsely better) that
--isolate
was detrimental to my assembly quality. Why is it recommended then?Beta Was this translation helpful? Give feedback.
All reactions