➡️ View the Tabular Data Visualization
This repository provides a curated and standardized dataset for creating and displaying clinical antibiograms.
The primary purpose of this dataset is to offer a clean, machine-readable data foundation for the automated generation of antibiograms (resistance charts). The data is structured for direct use in software applications, such as for visualizing antimicrobial resistance statistics.
It serves as a foundational dataset for other projects and is designed to be easily extendable with additional data sources.
The data is aggregated from established public sources and has been standardized and manually curated for this purpose.
- Core Data (Antibiotics, Organisms, Classes, Groups): The core definitions for antibiotics, microorganisms, and their classifications are derived from the
AMR
R package, which reflects scientific standards (including EUCAST). This data was exported and transformed into the CSV format for ease of use. Clinical and taxonomic groups were generated using custom R scripts. - Resistance Data: The surveillance statistics are sourced from the public reports of the Antibiotic Resistance Surveillance (ARS) system by the Robert Koch Institute (RKI), Germany.
- Standardization: All entities (antibiotics, organisms) are linked via a uniform
amr_code
. This ensures data integrity and simplifies the process of extending the dataset with new sources (e.g., from other years, regions, or countries).
- Hierarchical Data Layering: The
data_sources.csv
file defines a hierarchical tree of data sources. This structure allows an application to layer datasets, using general data (e.g., national averages) to fill gaps in more specific datasets (e.g., regional data). A "child" node's data is considered more specific and overrides the data from its "parent". - Standardized Codes: The universal use of
amr_code
ensures data integrity across all files. - Multilingual Support: All reference data includes identifiers in both German (
_de
) and English (_en
), making it suitable for international applications. - UI-Ready Abbreviations: The dataset includes abbreviated short names (
short_name_de
,short_name_en
) for antibiotics, ideal for use in space-constrained user interfaces.
The dataset is split into several CSV files linked by IDs.
antibiotics.csv
: Defines all antibiotics.amr_code
: Unique identifier for the antibiotic.class
: The ID of the antibiotic class.full_name_de
/_en
: Full name in German/English.short_name_de
/_en
: Abbreviated name for display purposes.synonyms_de
/_en
: Common synonyms or trade names.
organisms.csv
: Defines all microorganisms.amr_code
: Unique identifier for the microorganism.class_id
: The ID of the organism class.full_name_de
/_en
: Full name in German/English.groups
: A semicolon-separated list of group IDs fromorganism_groups.csv
.
organism_groups.csv
: Defines clinical or taxonomic groups for organisms.id
: Unique identifier for the group.name_de
/_en
: Name of the group in German/English.synonyms_de
/_en
: Common synonyms for the group.
antibiotic_classes.csv
&organism_classes.csv
: Define the respective classes and their names.
data_sources.csv
: The central manifest file describing the available resistance datasets.id
: Unique ID for the dataset.parent_id
: Defines the hierarchy (e.g.,de-nw-2023
is more specific thande-ars-2023
), enabling the override logic.name_de
/_en
: Name of the dataset.source_file
: The filename of the corresponding CSV file containing the resistance data.
- Resistance Files: Contain the actual data points.
antibiotic_id
: Foreign key toamr_code
inantibiotics.csv
.organism_id
: Foreign key toamr_code
inorganisms.csv
.resistance_pct
: The percentage of resistant isolates.n_isolates
: The total number of tested isolates.
eucast_expected_resistance.csv
: Defines a baseline of expected (intrinsic) resistances according to EUCAST expert rules. This file follows the same structure as other resistance files, withresistance_pct
set to100
andn_isolates
to0
to signify intrinsic resistance.
The included Python script, generate_html_report.py
, demonstrates how to consume the data. It reads one of the resistance files and the core data files to generate a standalone HTML report in a classic antibiogram layout (Organisms vs. Antibiotics).
To run the script:
python3 generate_html_report.py
This will create the file ars_2023_germany_report.html
in the root directory.