This repository contains the evaluation code for the Open Universal Arabic ASR Leaderboard, a continuous benchmark project for open-source multi-dialectal Arabic ASR models across various multi-dialectal datasets. The leaderboard is hosted at elmresearchcenter/open_universal_arabic_asr_leaderboard. For more detailed analysis such as models' robustness, speaker adaption, model efficiency and memory usage, please check our paper.
- [2025/01/11]: New model included: Nvidia Parakeet-CTC-XXL-1.1B-Universal
- [2025/01/11]: New model included: Nvidia Parakeet-CTC-XXL-1.1B-Concat
- [2023/01/11]: New dataset included: Casablanca
Please first download the following test sets
Test Set | Num Dialects | Test (h) |
---|---|---|
SADA | 10 | 10.7 |
Common Voice 18.0 | 25 | 12.6 |
MASC (Clean-Test) | 7 | 10.5 |
MASC (Noisy-Test) | 8 | 14.9 |
MGB-2 | Unspecified | 9.6 |
Casablanca | 8 | 7.7 |
We collected models from different toolkits, such as HuggingFace, SpeechBrain, Nvidia-NeMo, etc. Requirements for each library can be installed to evaluate a desired model. To install all the dependencies, run:
pip install -r requirements.txt
We provide easy-to-use inference functions, to run an ASR model:
- Go under
models/
, run the corresponding model inference function to generate an output manifest file containing ground-truths and predictions. - Run
calculate_wer
function ineval.py
on the output manifest file. - Details can be found in the methods' docstrings.
Please run the above evaluation for all the test sets under datasets/
, calculate the average WER/CER, then launch an issue or PR letting us know about your model, training data, and its performance.
We welcome models that:
- with a model architecture that is not present in the leaderboard.
- avoid using training sets in the same dataset as the test sets to avoid the in-domain issue.
@article{wang2024open,
title={Open Universal Arabic ASR Leaderboard},
author={Wang, Yingzhi and Alhmoud, Anas and Alqurishi, Muhammad},
journal={arXiv preprint arXiv:2412.13788},
year={2024}
}