Skip to content

Natural-Language-Processing-Elm/open_universal_arabic_asr_leaderboard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open Universal Arabic ASR Leaderboard

This repository contains the evaluation code for the Open Universal Arabic ASR Leaderboard, a continuous benchmark project for open-source multi-dialectal Arabic ASR models across various multi-dialectal datasets. The leaderboard is hosted at elmresearchcenter/open_universal_arabic_asr_leaderboard. For more detailed analysis such as models' robustness, speaker adaption, model efficiency and memory usage, please check our paper.

Updates

Datasets

Please first download the following test sets

Test Set Num Dialects Test (h)
SADA 10 10.7
Common Voice 18.0 25 12.6
MASC (Clean-Test) 7 10.5
MASC (Noisy-Test) 8 14.9
MGB-2 Unspecified 9.6
Casablanca 8 7.7

Requirements

We collected models from different toolkits, such as HuggingFace, SpeechBrain, Nvidia-NeMo, etc. Requirements for each library can be installed to evaluate a desired model. To install all the dependencies, run:

pip install -r requirements.txt

Evaluate a model

We provide easy-to-use inference functions, to run an ASR model:

  1. Go under models/, run the corresponding model inference function to generate an output manifest file containing ground-truths and predictions.
  2. Run calculate_wer function in eval.py on the output manifest file.
  3. Details can be found in the methods' docstrings.

Add a new model

Please run the above evaluation for all the test sets under datasets/, calculate the average WER/CER, then launch an issue or PR letting us know about your model, training data, and its performance.

We welcome models that:

  1. with a model architecture that is not present in the leaderboard.
  2. avoid using training sets in the same dataset as the test sets to avoid the in-domain issue.

Citation

@article{wang2024open,
  title={Open Universal Arabic ASR Leaderboard},
  author={Wang, Yingzhi and Alhmoud, Anas and Alqurishi, Muhammad},
  journal={arXiv preprint arXiv:2412.13788},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages