Antotate is a Python-based tool that automatically appends standardized metabolite annotations to Antimony-formatted biochemical models. By linking each species to curated identifiers from biochemical databases, Antotate improves model clarity, interoperability, and reuse in systems and synthetic biology workflows.
Metabolic models often include non-standardized species names that can cause ambiguity or hinder integration with visualization and simulation tools. Antotate resolves this by:
- Mapping species to display names and unique identifiers (e.g., KEGG compound IDs),
- Appending this information directly to the Antimony model file,
- Generating a confidence score for each annotation,
- Supporting multiple biochemical databases.
This tool is ideal for researchers preparing models for publication, exchange, or downstream analysis with tools like SBMLNetwork or Tellurium.
Antotate is written in Python (≥3.9) and uses standard libraries. To run:
- Python version 3.9 or later
- Jupyter Notebook (if running interactively)
To install required Python packages:
pip install -r requirements.txt
You can run Antotate either within Python or directly from the command line.
from antotate import Annotate
annotator = Annotate()
annotator.annotate('cell_free.txt', databases='kegg')
python antotate.py 'cell_free.txt' --databases kegg
You can specify one or more databases:
kegg
bigg.metabolite
chebi
hmdb
metacyc.compound
Antotate produces two outputs:
-
Annotated Antimony file (e.g.,
cell_free_kegg.txt
)- Appends display names and identifiers beneath the original reaction network.
- Format example:
Serine is "SER"; Serine identity "http://identifiers.org/kegg/C00716";
-
Confidence metrics CSV (e.g.,
confidence_metrics.csv
)- Summarizes the mapping for each species.
- Includes the original name, assigned display name, matched identifier, and a confidence score (0, low – 1, high).
antotate.py
– Main script for annotation.requirements.txt
– Required dependencies.- Example files:
cell_free.txt
– Example Antimony model.cell_free_kegg.txt
– Output with appended annotations.confidence_metrics.xlsx
– Annotation summary. This is originally created as a CSV, and that file was altered to include an additional tab to show the manual changes we've made to correct the automated annotations.
Input Antimony line:
R406 : 1 Serine + hEC43117 -> hEC43117 + 1 NH3 + 1 Pyr;
Appended by Antotate:
Serine is "SER";
Serine identity "http://identifiers.org/kegg/C00716";
NH3 is "AMMONIA";
NH3 identity "http://identifiers.org/kegg/C01342";
Pyr is "PYRUVATE";
Pyr identity "http://identifiers.org/kegg/C00022";
- Start with meaningful species names to improve annotation accuracy.
- Review the
confidence_metrics.csv
file to verify mappings. - Use multiple databases for broader coverage.