Citrinet model with LM to reduce the WER for microphone recorded audio #2104

kruthikakr · 2021-04-09T12:52:10Z

kruthikakr
Apr 9, 2021

Hi
I am using stt_en_citrinet_1024 model and able to get good transcript , I am using the recorded audios with microphone and WER is varying from 3.5% to 15%. This has names of person and place, how to include the words in the model.

any suggestions with following aspects

Preprocessing the audio
using LM model for decoding ( for citrinet is there any implementation in nemo)
Post processing steps say spell correction

looking for inputs .

Answered by VahidooX

Apr 22, 2021

The PR to support N-gram LM for ASR models is merged : #2066
It can do grid search for beam search decoder's hyperparameters to fine-tune them. The scripts support both character-level and BPE-level models. You may read more here: https://github.com/NVIDIA/NeMo/blob/main/docs/source/asr/asr_language_modelling.rst

You need to install the beam search decoders and KenLM to use this feature.

View full answer

titu1994 · 2021-04-09T19:24:27Z

titu1994
Apr 9, 2021
Maintainer

You could finetune Citrinet using the same Tokenizer on the specific domain (if there is sufficient data).

If you have some noise files, noise robust training, the same method as QuartzNet can be applied to Citrinet.

For preprocessing, there should be monochannel 16Khz wav files. We find that attempting signal denoising before inference will generally not do much better, and sometimes will do worse based on the artifacts introduced.

For language modelling with Citrinet (and BPE models in general), we plan to release code snippets to build custom KenLM model and run beam search through similar steps as the offline asr notebook. However there are some significant differences and we have not compiled a clean script for such a task yet. I will try to prioritize that in the coming weeks.

There is also Transformer based rescoring that can further boost offline WER reduction, though that pipeline is not ready yet.

@AlexGrinch is there any ETA (within some months?) that you expect to have the pipeline for transformer based rescoring?

0 replies

kruthikakr · 2021-04-15T18:50:50Z

kruthikakr
Apr 15, 2021
Author

Thanks you for response.
We would try to write the script with LM for BPE models . Any inputs or leads are much appreciated.

0 replies

titu1994 · 2021-04-15T18:58:58Z

titu1994
Apr 15, 2021
Maintainer

@VahidooX If you have a rough draft, could you create a gist and share here when it's ready ? We can clean it up in the actual PR

0 replies

kruthikakr · 2021-04-19T07:45:21Z

kruthikakr
Apr 19, 2021
Author

Can someone please share some details on this ? waiting for response.

0 replies

VahidooX · 2021-04-19T18:20:09Z

VahidooX
Apr 19, 2021
Collaborator

Created a PR for adding the feature of training and evaluating n-gram KenLM on top of BPE-base ASR models. It still needs the documentations. #2066

0 replies

VahidooX · 2021-04-22T22:08:53Z

VahidooX
Apr 22, 2021
Collaborator

The PR to support N-gram LM for ASR models is merged : #2066
It can do grid search for beam search decoder's hyperparameters to fine-tune them. The scripts support both character-level and BPE-level models. You may read more here: https://github.com/NVIDIA/NeMo/blob/main/docs/source/asr/asr_language_modelling.rst

You need to install the beam search decoders and KenLM to use this feature.

0 replies

kruthikakr · 2021-04-23T07:13:03Z

kruthikakr
Apr 23, 2021
Author

Thank you very much.

0 replies

kruthikakr · 2021-05-09T14:57:24Z

kruthikakr
May 9, 2021
Author

Hi.. We are able to successfully implement LM model with Citrinet . Now to add custom words/vocabulary , we don't have sufficient training data , Can we add custom tokens with the KenLM process ?

1 reply

titu1994 May 9, 2021
Maintainer

You can train your KenLM model with any kind of data. Just note that the model will predict UNK for such tokens so you might need to prepare your own script to generate the dataset for KenLM (based on the current script with modifications for custom words).

We don't support this in our scripts currently

kruthikakr · 2021-05-19T10:19:59Z

kruthikakr
May 19, 2021
Author

What are the other various ways we can try for adding the custom token to ASR .
Now the citrinet with LM model is still giving errors for few words , i think more traning is needed. But apart from traning , how to improvise the WER for online ASR with adding custom tokens.

Also Transformer based rescoring will be better than KenLM ?

1 reply

titu1994 May 19, 2021
Maintainer

Post training the only way to add custom words is with a language model which has those tokens.

kruthikakr · 2021-05-27T07:48:58Z

kruthikakr
May 27, 2021
Author

Hi..after fine tuning the Citirnet LM ( 40M lines of LM librispeech corpus) we observe more deletions at the output , For the same audio with just citrinet without LM these deletions were not there.

What could be step to solve this , We are trying to achieve less WER for user audio samples.which is the way to proceed ?

By finetuning the Citrinet with more various types of data ( other than the 6 datasets ie 7000hr data ) ? Or
By finetuning the Citinet LM model with more data ?

How can we improvise the model for user audios ?

1 reply

titu1994 May 27, 2021
Maintainer

It depends on the domain of text you are trying to transcribe - if it's significantly different from the academic datasets on which the model was trained on, it might be worth it to finetuned the ASR model.

If only text jargon is different, then getting more text data from the domain required in order to build a domain focussed language model would be easier.

lucasjinreal · 2021-12-29T11:59:06Z

lucasjinreal
Dec 29, 2021

Hi, all ,how do I download Citrinet and training on Chinese?

0 replies

titu1994 · 2021-12-29T18:18:11Z

titu1994
Dec 29, 2021
Maintainer

We have a config for QuartzNet 15x5 trained on Mandarin as well as pretrained models of Citrinet trained on Mandarin. You can use from_pretrained() to get those Citrinet models and fine time then (they have an inbuilt vocabulary of 5200~ tokens from Mandarin).

6 replies

titu1994 Dec 30, 2021
Maintainer

Please refer to the ASR Nemo documentation page for how to use pretrained models https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/results.html

You can get the config via
cfg = ASRModel.from_pretrained("xyz", return_config=True)
OmegaConf.save(cfg, path)

titu1994 Dec 30, 2021
Maintainer

And I don't think it's easy to run RIVA without docker.

lucasjinreal Dec 31, 2021

@titu1994 Does it need some specific dependencies except python?

titu1994 Dec 31, 2021
Maintainer

For Nemo version, You have to install the Nemo toolkit. Instructions to install the entire library are there in the documentation and readme page.

For Riva, refer to their install guide.

neso613 Feb 7, 2022

PLease help me to understand Citrinet model optimization in RIVA
@titu1994 @kruthikakr

Citrinet model with LM to reduce the WER for microphone recorded audio #2104

Uh oh!

kruthikakr Apr 9, 2021

Replies: 12 comments · 9 replies

Uh oh!

titu1994 Apr 9, 2021 Maintainer

Uh oh!

kruthikakr Apr 15, 2021 Author

Uh oh!

titu1994 Apr 15, 2021 Maintainer

Uh oh!

kruthikakr Apr 19, 2021 Author

Uh oh!

Uh oh!

VahidooX Apr 19, 2021 Collaborator

Uh oh!

VahidooX Apr 22, 2021 Collaborator

Uh oh!

kruthikakr Apr 23, 2021 Author

Uh oh!

kruthikakr May 9, 2021 Author

Uh oh!

titu1994 May 9, 2021 Maintainer

Uh oh!

kruthikakr May 19, 2021 Author

Uh oh!

titu1994 May 19, 2021 Maintainer

Uh oh!

Uh oh!

kruthikakr May 27, 2021 Author

Uh oh!

titu1994 May 27, 2021 Maintainer

Uh oh!

lucasjinreal Dec 29, 2021

Uh oh!

titu1994 Dec 29, 2021 Maintainer

Uh oh!

titu1994 Dec 30, 2021 Maintainer

Uh oh!

titu1994 Dec 30, 2021 Maintainer

Uh oh!

lucasjinreal Dec 31, 2021

Uh oh!

titu1994 Dec 31, 2021 Maintainer

Uh oh!

neso613 Feb 7, 2022

kruthikakr
Apr 9, 2021

Replies: 12 comments 9 replies

titu1994
Apr 9, 2021
Maintainer

kruthikakr
Apr 15, 2021
Author

titu1994
Apr 15, 2021
Maintainer

kruthikakr
Apr 19, 2021
Author

VahidooX
Apr 19, 2021
Collaborator

VahidooX
Apr 22, 2021
Collaborator

kruthikakr
Apr 23, 2021
Author

kruthikakr
May 9, 2021
Author

titu1994 May 9, 2021
Maintainer

kruthikakr
May 19, 2021
Author

titu1994 May 19, 2021
Maintainer

kruthikakr
May 27, 2021
Author

titu1994 May 27, 2021
Maintainer

lucasjinreal
Dec 29, 2021

titu1994
Dec 29, 2021
Maintainer

titu1994 Dec 30, 2021
Maintainer

titu1994 Dec 30, 2021
Maintainer

titu1994 Dec 31, 2021
Maintainer