Aariz: A Benchmark Dataset for Automatic Cephalometric Landmark Detection and CVM Stage Classification
Preprint (arXiv) · Journal Article · Dataset (Figshare)
Aariz is a benchmark dataset comprising 1000 lateral cephalometric radiographs (LCRs), acquired from 7 different radiographic imaging devices with varying resolutions, making it one of the most diverse and comprehensive cephalometric datasets available to date. Each radiograph has been meticulously annotated by clinical experts with 29 cephalometric landmarks, featuring a rich set of dental and soft-tissue points. In addition to landmark annotation, Cervical Vertebral Maturation (CVM) stages are labelled for each patient, positioning Aariz as the first standard resource for CVM classification. To ensure fair evaluation and reproducibility, the dataset is uniformly partitioned into train, validation, and test sets, stratified across all imaging devices.
- 31/07/2025: Aariz has officially made its debut in Nature Scientific Data.
- 21/05/2025: The dataset is now publicly available, offering the research community access to 1000 annotated cephalometric X-rays with 29 anatomical landmarks & CVM stages.
- 19/12/2022: Aariz joins CEPHA29 Challenge on Automatic Cephalometric Landmark Detection, held under IEEE International Symposium on Biomedical Imaging (ISBI).
To create an object of the dataset class and use it to read images and their corresponding annotations, you can first import the AarizDataset
and then instantiate the class with the appropriate parameters, such as the dataset_folder_path
containing the images and annotations, the mode
for which to read the files i.e TRAIN
, VALID
and TEST
.
from dataset import AarizDataset
dataset = AarizDataset(dataset_folder_path="folder/to/dataset", mode="TRAIN")
images, landmarks, cvm_stages = dataset[0]
This section presents the average performance on both the validation and test sets of the Aariz dataset, comparing several published methods with our proposed baseline model. The evaluation metrics include Mean Radial Error (MRE) with its Standard Deviation (SD), and Success Detection Rate (SDR) at multiple clinically relevant thresholds (2.0 mm, 2.5 mm, 3.0 mm, and 4.0 mm). The best results for each metric are highlighted in bold.
Method | MRE ± SD (mm) |
SDR (%) | |||
---|---|---|---|---|---|
2.0 mm | 2.5 mm | 3.0 mm | 4.0 mm | ||
Khan et al. (2024) | 1.92 ± 7.85 | 78.54 | 85.72 | 89.64 | 94.49 |
Khalid et al. (2024) | 1.87 ± 4.01 | 75.17 | 82.43 | 88.78 | 93.01 |
Khan et al. (2025)* | 1.69 ± 3.36 | 81.18 | 87.28 | 90.82 | 94.82 |
* Baseline model, currently under-review
We would like to express our heartfelt gratitude to all of the clinicians at Islamic International Dental College for their tireless efforts and contributions to the annotation process of this dataset. This research would not have been possible without the support of Riphah International University, Islamabad, Pakistan. We also extend our sincere thanks to the patients who provided consent for the use of their cephalometric images. Finally, we would like to acknowledge the valuable feedback and suggestions provided by the anonymous reviewers, which helped us improve the quality of this dataset.
If you use our dataset in your research or projects, please consider citing our papers:
@article{khalid2025benchmark,
title={A Benchmark Dataset for Automatic Cephalometric Landmark Detection and CVM Stage Classification},
author={Khalid, Muhammad Anwaar and Zulfiqar, Kanwal and Bashir, Ulfat and Shaheen, Areeba and Iqbal, Rida and Rizwan, Zarnab and Rizwan, Ghina and Fraz, Muhammad Moazam},
journal={Scientific Data},
volume={12},
number={1},
pages={1336},
year={2025},
publisher={Nature Publishing Group UK London}
}
@article{khalid2022cepha29,
title={Cepha29: Automatic cephalometric landmark detection challenge 2023},
author={Khalid, Muhammad Anwaar and Zulfiqar, Kanwal and Bashir, Ulfat and Shaheen, Areeba and Iqbal, Rida and Rizwan, Zarnab and Rizwan, Ghina and Fraz, Muhammad Moazam},
journal={arXiv preprint arXiv:2212.04808},
year={2022}
}