Citrinet model with LM to reduce the WER for microphone recorded audio #2104
-
Hi any suggestions with following aspects
looking for inputs . |
Beta Was this translation helpful? Give feedback.
Replies: 12 comments 9 replies
-
You could finetune Citrinet using the same Tokenizer on the specific domain (if there is sufficient data). If you have some noise files, noise robust training, the same method as QuartzNet can be applied to Citrinet. For preprocessing, there should be monochannel 16Khz wav files. We find that attempting signal denoising before inference will generally not do much better, and sometimes will do worse based on the artifacts introduced. For language modelling with Citrinet (and BPE models in general), we plan to release code snippets to build custom KenLM model and run beam search through similar steps as the offline asr notebook. However there are some significant differences and we have not compiled a clean script for such a task yet. I will try to prioritize that in the coming weeks. There is also Transformer based rescoring that can further boost offline WER reduction, though that pipeline is not ready yet. @AlexGrinch is there any ETA (within some months?) that you expect to have the pipeline for transformer based rescoring? |
Beta Was this translation helpful? Give feedback.
-
Thanks you for response. |
Beta Was this translation helpful? Give feedback.
-
@VahidooX If you have a rough draft, could you create a gist and share here when it's ready ? We can clean it up in the actual PR |
Beta Was this translation helpful? Give feedback.
-
Can someone please share some details on this ? waiting for response. |
Beta Was this translation helpful? Give feedback.
-
Created a PR for adding the feature of training and evaluating n-gram KenLM on top of BPE-base ASR models. It still needs the documentations. #2066 |
Beta Was this translation helpful? Give feedback.
-
The PR to support N-gram LM for ASR models is merged : #2066 You need to install the beam search decoders and KenLM to use this feature. |
Beta Was this translation helpful? Give feedback.
-
Thank you very much. |
Beta Was this translation helpful? Give feedback.
-
Hi.. We are able to successfully implement LM model with Citrinet . Now to add custom words/vocabulary , we don't have sufficient training data , Can we add custom tokens with the KenLM process ? |
Beta Was this translation helpful? Give feedback.
-
What are the other various ways we can try for adding the custom token to ASR . Also Transformer based rescoring will be better than KenLM ? |
Beta Was this translation helpful? Give feedback.
-
Hi..after fine tuning the Citirnet LM ( 40M lines of LM librispeech corpus) we observe more deletions at the output , For the same audio with just citrinet without LM these deletions were not there. What could be step to solve this , We are trying to achieve less WER for user audio samples.which is the way to proceed ?
How can we improvise the model for user audios ? |
Beta Was this translation helpful? Give feedback.
-
Hi, all ,how do I download Citrinet and training on Chinese? |
Beta Was this translation helpful? Give feedback.
-
We have a config for QuartzNet 15x5 trained on Mandarin as well as pretrained models of Citrinet trained on Mandarin. You can use from_pretrained() to get those Citrinet models and fine time then (they have an inbuilt vocabulary of 5200~ tokens from Mandarin). |
Beta Was this translation helpful? Give feedback.
The PR to support N-gram LM for ASR models is merged : #2066
It can do grid search for beam search decoder's hyperparameters to fine-tune them. The scripts support both character-level and BPE-level models. You may read more here: https://github.com/NVIDIA/NeMo/blob/main/docs/source/asr/asr_language_modelling.rst
You need to install the beam search decoders and KenLM to use this feature.