Kazakh Speech Corpus 2

This repository provides the recipe for the paper KSC2: An Industrial-Scale Open-Source Kazakh Speech Corpus.

Pre-trained models

You can download the best performing model HERE.

Inference

To convert your audio file to text, please make sure it follows a wav format with sample rate of 16k. Unzip the pre-trained model in the current directory, and make sure that asr_model_path & lm_model_path refer to valid directories. Install the necessary packages by running pip install -r requirements.txt. To perform the evaluation please run:

python recognize.py -f <path_to_your_wav>

Dataset

Download ISSAI_KSC2 dataset on HuggingFace form, and untar tar -xvf ISSAI_KSC2.tar.gz in the directory of your choice. Specify the path to the dataset inside espnet/egs2/Kazakh_ASR/asr1/run.sh file:

dataset_path=specify_path

Training

Our code builds upon ESPnet, and requires prior installation of the framework for DNN training. Please follow the installation guide and put the TurkicASR folder inside espnet/egs2/ directory. Run the traning scripts with ./run.sh

Citation

@inproceedings{mussakhojayeva22,
  author={Saida Mussakhojayeva, Yerbolat Khassanov, Huseyin Atakan Varol},
  title={KSC2: An Industrial-Scale Open-Source Kazakh Speech Corpus},
  year=2022,
  booktitle={Interspeech},
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
asr1		asr1
.gitattributes		.gitattributes
LICENSE.md		LICENSE.md
README.md		README.md
recognize.py		recognize.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kazakh Speech Corpus 2

Pre-trained models

Inference

Dataset

Training

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

IS2AI/Kazakh_ASR

Folders and files

Latest commit

History

Repository files navigation

Kazakh Speech Corpus 2

Pre-trained models

Inference

Dataset

Training

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages