Open Universal Arabic ASR Leaderboard

This repository contains the evaluation code for the Open Universal Arabic ASR Leaderboard, a continuous benchmark project for open-source multi-dialectal Arabic ASR models across various multi-dialectal datasets. The leaderboard is hosted at elmresearchcenter/open_universal_arabic_asr_leaderboard. For more detailed analysis such as models' robustness, speaker adaption, model efficiency and memory usage, please check our paper.

Updates

[2025/01/11]: New model included: Nvidia Parakeet-CTC-XXL-1.1B-Universal
[2025/01/11]: New model included: Nvidia Parakeet-CTC-XXL-1.1B-Concat
[2023/01/11]: New dataset included: Casablanca

Datasets

Please first download the following test sets

Test Set	Num Dialects	Test (h)
SADA	10	10.7
Common Voice 18.0	25	12.6
MASC (Clean-Test)	7	10.5
MASC (Noisy-Test)	8	14.9
MGB-2	Unspecified	9.6
Casablanca	8	7.7

Requirements

We collected models from different toolkits, such as HuggingFace, SpeechBrain, Nvidia-NeMo, etc. Requirements for each library can be installed to evaluate a desired model. To install all the dependencies, run:

pip install -r requirements.txt

Evaluate a model

We provide easy-to-use inference functions, to run an ASR model:

Go under models/, run the corresponding model inference function to generate an output manifest file containing ground-truths and predictions.
Run calculate_wer function in eval.py on the output manifest file.
Details can be found in the methods' docstrings.

Add a new model

Please run the above evaluation for all the test sets under datasets/, calculate the average WER/CER, then launch an issue or PR letting us know about your model, training data, and its performance.

We welcome models that:

with a model architecture that is not present in the leaderboard.
avoid using training sets in the same dataset as the test sets to avoid the in-domain issue.

Citation

@article{wang2024open,
  title={Open Universal Arabic ASR Leaderboard},
  author={Wang, Yingzhi and Alhmoud, Anas and Alqurishi, Muhammad},
  journal={arXiv preprint arXiv:2412.13788},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Open Universal Arabic ASR Leaderboard

Updates

Datasets

Requirements

Evaluate a model

Add a new model

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
datasets		datasets
models		models
README.md		README.md
eval.py		eval.py
requirements.txt		requirements.txt

Natural-Language-Processing-Elm/open_universal_arabic_asr_leaderboard

Folders and files

Latest commit

History

Repository files navigation

Open Universal Arabic ASR Leaderboard

Updates

Datasets

Requirements

Evaluate a model

Add a new model

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages