Replies: 1 comment
-
There are many different challenges with impaired speech. In general we would suggest to use a strong asr model, trained on large datasets to be able to generalize well. That would be Conformer CTC/RNNT Large or XLarge. Next is to finetune the model on the dataset for impaired speech. There are many ways to do this, but the simplest strategy is to take a pretrained model, normalize text to the format of that models Tokenizer and simply train it with a lowish LR. However the text in your dataset might not map to the original Tokenizer (say punctuation like - might be used to represent pauses), then in that case you'll need to create a new Tokenizer. You can follow the ctc transfer learning tutorial for those steps. Because you are changing the tokenizer, your model will need to learn a lot more on the impaired speech dataset so you'll need a variety of techniques to attain good wer. Vocab size should always match the original model vocab size, so that you can load the weights of the original model. For asr personalizations, if your text is in correct format, you can try adapters for per person adaptation. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello!
I'm a beginner to ASR (and AI in general) and I'm trying to train a model to recognize impaired speech. I'm right now at 30% WER (Sota is around 25%) but since I don't have a lot of knowledge I'm not able yet to understand/reproduce all the techniques being used in those papers. A lot of them are HMM systems that use adaptation techniques like MAP and MLLR. I decided to focus on neural networks (the ones available in NeMo) but I'm not sure exactly what I can play with. So far, I tried:
Any other aspects that I can explore? Is there any way to adapt a neural network on an individual speaker other than simply fine-tuning on that speaker after pre-training on all speakers?
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions