What other aspects of ASR training can I experiment with? #5014

sqrk · 2022-09-27T05:39:53Z

sqrk
Sep 27, 2022

Hello!

I'm a beginner to ASR (and AI in general) and I'm trying to train a model to recognize impaired speech. I'm right now at 30% WER (Sota is around 25%) but since I don't have a lot of knowledge I'm not able yet to understand/reproduce all the techniques being used in those papers. A lot of them are HMM systems that use adaptation techniques like MAP and MLLR. I decided to focus on neural networks (the ones available in NeMo) but I'm not sure exactly what I can play with. So far, I tried:

Different architectures
Different model sizes
Different vocabulary sizes (for models like the Conformer)
Different learning rates

Any other aspects that I can explore? Is there any way to adapt a neural network on an individual speaker other than simply fine-tuning on that speaker after pre-training on all speakers?

Thank you!

titu1994 · 2022-09-30T00:27:49Z

titu1994
Sep 30, 2022
Maintainer

There are many different challenges with impaired speech. In general we would suggest to use a strong asr model, trained on large datasets to be able to generalize well. That would be Conformer CTC/RNNT Large or XLarge.

Next is to finetune the model on the dataset for impaired speech. There are many ways to do this, but the simplest strategy is to take a pretrained model, normalize text to the format of that models Tokenizer and simply train it with a lowish LR. However the text in your dataset might not map to the original Tokenizer (say punctuation like - might be used to represent pauses), then in that case you'll need to create a new Tokenizer. You can follow the ctc transfer learning tutorial for those steps. Because you are changing the tokenizer, your model will need to learn a lot more on the impaired speech dataset so you'll need a variety of techniques to attain good wer.

Vocab size should always match the original model vocab size, so that you can load the weights of the original model.

For asr personalizations, if your text is in correct format, you can try adapters for per person adaptation.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What other aspects of ASR training can I experiment with? #5014

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

What other aspects of ASR training can I experiment with? #5014

Uh oh!

sqrk Sep 27, 2022

Replies: 1 comment

Uh oh!

titu1994 Sep 30, 2022 Maintainer

sqrk
Sep 27, 2022

titu1994
Sep 30, 2022
Maintainer