Welcome to the repo of the ontology-based rare disease common data model (RD-CDM) harmonising international registry use, HL7® FHIR®, and the GA4GH Phenopacket Schema.
Latest docs: https://rd-cdm.readthedocs.io/en/latest/
The corresponding paper for RD-CDM v2.0.0 has been published in Nature Scientific Data:
https://www.nature.com/articles/s41597-025-04558-z
- Project Description
- What you get from PyPI
- Features
- Installation
- CLI tools
- Versioning & File Layout
- Validating with BioPortal
- Contributing & Contact
- Resources
- License
- Citing
- Acknowledgements
The ontology-based RD-CDM harmonizes rare disease data capture across registries. It integrates ERDRI-CDS, HL7 FHIR, and GA4GH Phenopacket Schema to support interoperable data for research and care. RD-CDM v2.0.x comprises 78 data elements covering formal criteria, personal information, patient status, disease, genetic findings, phenotypic findings, and family history.
Installing rd-cdm
from PyPI provides:
-
Schema
src/rd_cdm/schema/rd_cdm.yaml
-
Versioned instances (data packs)
src/rd_cdm/instances/v2_0_1/*.yaml
(e.g.,code_systems.yaml
,data_elements.yaml
,value_sets.yaml
)- merged file:
src/rd_cdm/instances/v2_0_1/rd_cdm_v2_0_1.yaml
- exports (if present or generated locally):
src/rd_cdm/instances/v2_0_1/jsons/*.json
src/rd_cdm/instances/v2_0_1/csvs/*.csv
-
Generated Python & Pydantic classes (LinkML)
src/rd_cdm/python_classes/rd_cdm.py
(LinkML runtime dataclasses)src/rd_cdm/python_classes/rd_cdm_pydantic.py
(generated from the schema via LinkML’s Pydantic generator)
-
Utilities / CLI entry points
rdcdm-merge
– merge instance parts intord_cdm_vX_Y_Z.yaml
rdcdm-json
– per-file JSON export + combinedrd_cdm_vX_Y_Z.json
rdcdm-csv
– per-file CSV export + combinedrd_cdm_vX_Y_Z.csv
rdcdm-validate
– validate ontology codes via BioPortal
- Interoperability: Aligns with HL7 FHIR v4.0.1 and GA4GH Phenopacket v2.0
- Ontology-driven: Uses SNOMED CT, LOINC, NCIT, MONDO, OMIM, HPO, and more
- Modular: Clear separation of schema, instances, and exports
- Versioned data: Instances shipped and resolved per version (e.g.,
v2_0_1
) - Tooling: Merge, export, and validation utilities with simple CLIs
- (Optional) Pydantic models: Strict runtime validation generated from LinkML
From PyPI:
pip install rd-cdm
Optional extras for testing/docs:
pip install rd-cdm[test] # pytest, etc.
# or
pip install rd-cdm[docs]
git clone https://github.com/BIH-CEI/rd-cdm.git
cd rd-cdm
# (Recommended) create a venv
python -m venv .venv && source .venv/bin/activate
pip install -U pip
pip install -e .[test]
pytest -q
We use a src/ layout. If you run tools directly, ensure
PYTHONPATH=src
or use the installed CLI entry points shown below.
After installation you should have these commands:
# Merge the versioned parts into rd_cdm_vX_Y_Z.yaml (auto-resolves latest if not given)
rdcdm-merge # or: rdcdm-merge --version 2.0.1
# Export JSON (per-file .json + combined rd_cdm_vX_Y_Z.json)
rdcdm-json # or: rdcdm-json -v 2.0.1
# Export CSV (per-file .csv + combined rd_cdm_vX_Y_Z.csv)
rdcdm-csv # or: rdcdm-csv -v 2.0.1
# Validate merged instance file against ontologies via BioPortal
rdcdm-validate # or: rdcdm-validate -v 2.0.1 (Note: set up BioPortal API key for this)
The rdcdm-validate
command uses the BioPortal API
to check ontology term validity. This requires an API key to be set as an environment variable.
Sign up (or log in) at https://bioportal.bioontology.org/accounts/new
- Go to your account settings and copy your API Key.
- Set the API key in your environment
export BIOPORTAL_API_KEY="your-key-here"
setx BIOPORTAL_API_KEY "your-key-here"
The RD-CDM is a community-driven effort and we invite open and international
collaboration. Please feel free to create issues, discuss features,
or submit pull requests to help enhance this project. For larger contributions,
consider reaching out to discuss collaboration opportunities.
Please find more information on how to contact us and contribute
in the Contribution
section of our documentation.
RareLink is a novel rare disease framework in REDCap linking international registries, FHIR, and Phenopackets based on the RD-CDM. It is designed to support the collection of harmonized data for rare disease research across any REDCap project worldwide and allows for the preconfigured export of the RD-CDM data in FHIR and Phenopackets formats.
For more information on RareLink, please see the:
- Human Phenotype Ontology 🔗
- Monarch Initiative Disease Ontology 🔗
- Online Mendelian Inheritance in Man 🔗
- Orphanet Rare Disease Ontology 🔗
- SNOMED CT 🔗
- ICD 11 🔗
- ICD10CM 🔗
- National Center for Biotechnology Information Taxonomy 🔗
- Logical Observation Identifiers Names and Codes 🔗
- HUGO Gene Nomenclature Committee 🔗
- Gene Ontology 🔗
- NCI Thesaurus OBO Edition 🔗
For the versions used in a specific RD-CDM version, please see the resources in our documentation.
This project is licensed under the terms of the MIT License
If you use the model for your research, do not hesitate to reach out and please cite our article:
Graefe, A.S.L., Hübner, M.R., Rehburg, F. et al. An ontology-based rare disease common data model harmonising international registries, FHIR, and Phenopackets. Sci Data 12, 234 (2025). https://doi.org/10.1038/s41597-025-04558-z
We would like to extend our thanks to all the authors involved in the development of this RD-CDM model.
- Authors:
- Adam SL Graefe
- Filip Rehburg
- Samer Alkarkoukly
- Daniel Danis
- Peter N. Robinson
- Oya Beyan
- Sylvia Thun