This repository provides the recipe for the paper KSC2: An Industrial-Scale Open-Source Kazakh Speech Corpus.
You can download the best performing model HERE.
To convert your audio file to text, please make sure it follows a wav format with sample rate of 16k. Unzip the pre-trained model in the current directory, and make sure that asr_model_path & lm_model_path
refer to valid directories. Install the necessary packages by running pip install -r requirements.txt
.
To perform the evaluation please run:
python recognize.py -f <path_to_your_wav>
Download ISSAI_KSC2 dataset on HuggingFace form, and untar tar -xvf ISSAI_KSC2.tar.gz
in the directory of your choice. Specify the path to the dataset inside espnet/egs2/Kazakh_ASR/asr1/run.sh file:
dataset_path=specify_path
Our code builds upon ESPnet, and requires prior installation of the framework for DNN training. Please follow the installation guide and put the TurkicASR folder inside espnet/egs2/
directory. Run the traning scripts with ./run.sh
@inproceedings{mussakhojayeva22,
author={Saida Mussakhojayeva, Yerbolat Khassanov, Huseyin Atakan Varol},
title={KSC2: An Industrial-Scale Open-Source Kazakh Speech Corpus},
year=2022,
booktitle={Interspeech},
}