Bakta Prokka SNP Comparison

Hello, I was trying out the most recent version of the pipeline using bakta and compared to running with prokka on a set of CRAB sequences. Similar to a previous issue I brought up a few months ago when the default aligner was changed from roary to panaroo, I found that which gene annotation was used had a significant impact on the resulting SNP matrix and interpretation. 

Please find attached an excel file that includes a comparison of output matrices and core genome metrics. 

[Matrix Comparison.xlsx](https://github.com/user-attachments/files/18773475/Matrix.Comparison.xlsx)

Up to this point I have been using prokka and roary, so the first matrix is essentially the status quo from my point of view. To focus on one part of the matrix, S19-S23 are all within two SNPs, but fewer than 10 SNPs apart from a few others included in the analysis and not more than 51 SNPs to any other sequence. 

In the second matrix (bakta/rorary). S19-S23 now looks to be split into two subclusters, and more surprising to me are now >1000 SNPs apart from all other sequences. 

In the third matrix, since the default annotator/aligner is Bakta/panaroo, I ran the same analysis this way as well. Another slightly different interpretation here. S19-S23 are no longer drastically different from the others as with bakta/roary, but there are other differences such as S22 no longer clusters with S19-S21, S23.

The final matrix is generated by BugSeq’s refMLST method and appears to most closely resemble the prokka/roary matrix.

I can share the fastqs files if you are interested.

Thanks,
Wes


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bakta Prokka SNP Comparison #257

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bakta Prokka SNP Comparison #257

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions