Skip to content

Bakta Prokka SNP Comparison #257

@whottel

Description

@whottel

Hello, I was trying out the most recent version of the pipeline using bakta and compared to running with prokka on a set of CRAB sequences. Similar to a previous issue I brought up a few months ago when the default aligner was changed from roary to panaroo, I found that which gene annotation was used had a significant impact on the resulting SNP matrix and interpretation.

Please find attached an excel file that includes a comparison of output matrices and core genome metrics.

Matrix Comparison.xlsx

Up to this point I have been using prokka and roary, so the first matrix is essentially the status quo from my point of view. To focus on one part of the matrix, S19-S23 are all within two SNPs, but fewer than 10 SNPs apart from a few others included in the analysis and not more than 51 SNPs to any other sequence.

In the second matrix (bakta/rorary). S19-S23 now looks to be split into two subclusters, and more surprising to me are now >1000 SNPs apart from all other sequences.

In the third matrix, since the default annotator/aligner is Bakta/panaroo, I ran the same analysis this way as well. Another slightly different interpretation here. S19-S23 are no longer drastically different from the others as with bakta/roary, but there are other differences such as S22 no longer clusters with S19-S21, S23.

The final matrix is generated by BugSeq’s refMLST method and appears to most closely resemble the prokka/roary matrix.

I can share the fastqs files if you are interested.

Thanks,
Wes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions