Adding KenLM degrades performance #3384

errorfixrepeat · 2022-01-02T16:50:08Z

errorfixrepeat
Jan 2, 2022

I am trying to add a language model to a Quartznet 15x5 which I have finetuned to a new language. I am using the same manifest JSON I used to train the acoustic model so am expecting only a small benefit. However the introduction of the language model causes a significant drop in performance.

When I switch from 'greedy' to 'beamsearch' I see a small WER improvement of 0.32, however using 'beamsearch_ngram' I always see a performance drop of 6+ WER. I have tried fine-tuning alpha and beta, however the performance always decreases. Interestingly this still occurs when alpha and beta are 0, where the language model should have no effect.

Does anything stand out from the training and inference runs below?

Training command

'python scripts/asr_language_modeling/ngram_lm/train_kenlm.py --train_file /data/manifests/commonvoice_train_manifest_processed.json --nemo_model_file nemo_experiments/QuartzNet15x5/2021-12-20_10-56-34/checkpoints/QuartzNet15x5.nemo --ngram_length 3 --kenlm_bin_path decoders/kenlm/build/bin --kenlm_model_file nemo_experiments/QuartzNet15x5/2021-12-20_10-56-34/checkpoints/QuartzNet15x5.kenlm'

Training output (minus model load for brevity)

[NeMo I 2022-01-02 16:26:07 save_restore_connector:143] Model EncDecCTCModel was successfully restored from /workspace/nemo/nemo_experiments/QuartzNet15x5/2021-12-20_10-56-34/checkpoints/QuartzNet15x5.nemo.
[NeMo I 2022-01-02 16:26:07 train_kenlm:95] Encoding the train file '/data/manifests/commonvoice_train_manifest_processed.json' ...
Read 0 lines: 6625 lines [00:00, 333849.12 lines/s]
=== 1/5 Counting and sorting n-grams ===
Reading /workspace/nemo/nemo_experiments/QuartzNet15x5/2021-12-20_10-56-34/checkpoints/QuartzNet15x5.kenlm.tmp.txt
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Unigram tokens 47739 types 13247
=== 2/5 Calculating and sorting adjusted counts ===
Chain sizes: 1:158964 2:9363862528 3:17557243904
Statistics:
1 13247 D1=0.741835 D2=1.14321 D3+=1.30638
2 37836 D1=0.876837 D2=1.14728 D3+=1.49816
3 44021 D1=0.941499 D2=1.34786 D3+=1.40382
Memory estimate for binary LM:
type      kB
probing 1996 assuming -p 1.5
probing 2270 assuming -r models -p 1.5
trie     981 without quantization
trie     644 assuming -q 8 -b 8 quantization 
trie     941 assuming -a 22 array pointer compression
trie     603 assuming -a 22 -q 8 -b 8 array pointer compression and quantization
=== 3/5 Calculating and sorting initial probabilities ===
Chain sizes: 1:158964 2:605376 3:880420
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
####################################################################################################
=== 4/5 Calculating and writing order-interpolated probabilities ===
Chain sizes: 1:158964 2:605376 3:880420
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
####################################################################################################
=== 5/5 Writing ARPA model ===
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Name:lmplz	VmPeak:26454684 kB	VmRSS:8032 kB	RSSMax:6091620 kB	user:0.286952	sys:1.49853	CPU:1.78549	real:1.76394
[NeMo I 2022-01-02 16:26:09 train_kenlm:135] Running binary_build command 
    
    decoders/kenlm/build/bin/lmplz -o 3 --text nemo_experiments/QuartzNet15x5/2021-12-20_10-56-34/checkpoints/QuartzNet15x5.kenlm.tmp.txt --arpa nemo_experiments/QuartzNet15x5/2021-12-20_10-56-34/checkpoints/QuartzNet15x5.kenlm.tmp.arpa 
    
    
Reading nemo_experiments/QuartzNet15x5/2021-12-20_10-56-34/checkpoints/QuartzNet15x5.kenlm.tmp.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Identifying n-grams omitted by SRI
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Writing trie
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
SUCCESS
[NeMo I 2022-01-02 16:26:09 train_kenlm:148] Deleted the temporary encoded training file 'nemo_experiments/QuartzNet15x5/2021-12-20_10-56-34/checkpoints/QuartzNet15x5.kenlm.tmp.txt'.
[NeMo I 2022-01-02 16:26:09 train_kenlm:150] Deleted the arpa file 'nemo_experiments/QuartzNet15x5/2021-12-20_10-56-34/checkpoints/QuartzNet15x5.kenlm.tmp.arpa'.

Beamsearch command

python scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py --nemo_model_file nemo_experiments/QuartzNet15x5/2021-12-20_10-56-34/checkpoints/QuartzNet15x5.nemo --input_manifest /data/manifests/commonvoice_dev_manifest_processed.json --kenlm_model_file nemo_experiments/QuartzNet15x5/2021-12-20_10-56-34/checkpoints/QuartzNet15x5.kenlm --beam_width 64 --beam_alpha 0 --beam_beta 0 --decoding_mode beamsearch

Beamsearch output

[NeMo I 2022-01-02 16:32:20 save_restore_connector:143] Model EncDecCTCModel was successfully restored from /workspace/nemo/nemo_experiments/QuartzNet15x5/2021-12-20_10-56-34/checkpoints/QuartzNet15x5.nemo.
Reading Manifest /data/manifests/commonvoice_dev_manifest_processed.json ...: 3548it [00:00, 421318.50it/s]
Transcribing:   0%|                                     | 0/222 [00:00<?, ?it/s][NeMo W 2022-01-02 16:32:20 patch_utils:49] torch.stft() signature has been updated for PyTorch 1.7+
    Please update PyTorch to remain compatible with later versions of NeMo.
[NeMo W 2022-01-02 16:32:21 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/torch/_tensor.py:565: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
    To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /opt/pytorch/pytorch/aten/src/ATen/native/BinaryOps.cpp:506.)
      return torch.floor_divide(self, other)
    
Transcribing: 100%|███████████████████████████| 222/222 [00:34<00:00,  6.43it/s]
[NeMo I 2022-01-02 16:32:55 eval_beamsearch_ngram:283] Greedy WER/CER = 28.11%/5.23%
[NeMo I 2022-01-02 16:32:55 eval_beamsearch_ngram:314] ==============================Starting the beam search decoding===============================
[NeMo I 2022-01-02 16:32:55 eval_beamsearch_ngram:315] Grid search size: 1
[NeMo I 2022-01-02 16:32:55 eval_beamsearch_ngram:316] It may take some time...
[NeMo I 2022-01-02 16:32:55 eval_beamsearch_ngram:317] ==============================================================================================
Beam search decoding with width=64, alpha=0.0, beta=0.0: 100%|██████████████████████████| 28/28 [00:15<00:00,  1.85it/s]
[NeMo I 2022-01-02 16:33:10 eval_beamsearch_ngram:147] WER/CER with beam search decoding = 27.79%/5.17%
[NeMo I 2022-01-02 16:33:10 eval_beamsearch_ngram:152] Oracle WER/CER in candidates with perfect LM= 15.47%/2.94%
[NeMo I 2022-01-02 16:33:10 eval_beamsearch_ngram:157] =================================================================================

Beamsearch_ngram command

python scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py --nemo_model_file nemo_experiments/QuartzNet15x5/2021-12-20_10-56-34/checkpoints/QuartzNet15x5.nemo --input_manifest /data/manifests/commonvoice_dev_manifest_processed.json --kenlm_model_file nemo_experiments/QuartzNet15x5/2021-12-20_10-56-34/checkpoints/QuartzNet15x5.kenlm --beam_width 64 --beam_alpha 0 --beam_beta 0 --decoding_mode beamsearch_ngram

Beamsearch_ngram output

[NeMo I 2022-01-02 16:37:34 save_restore_connector:143] Model EncDecCTCModel was successfully restored from /workspace/nemo/nemo_experiments/QuartzNet15x5/2021-12-20_10-56-34/checkpoints/QuartzNet15x5.nemo.
Reading Manifest /data/manifests/commonvoice_dev_manifest_processed.json ...: 3548it [00:00, 411076.78it/s]
Transcribing:   0%|                                     | 0/222 [00:00<?, ?it/s][NeMo W 2022-01-02 16:37:34 patch_utils:49] torch.stft() signature has been updated for PyTorch 1.7+
    Please update PyTorch to remain compatible with later versions of NeMo.
[NeMo W 2022-01-02 16:37:35 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/torch/_tensor.py:565: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
    To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /opt/pytorch/pytorch/aten/src/ATen/native/BinaryOps.cpp:506.)
      return torch.floor_divide(self, other)
    
Transcribing: 100%|███████████████████████████| 222/222 [00:34<00:00,  6.39it/s]
[NeMo I 2022-01-02 16:38:10 eval_beamsearch_ngram:283] Greedy WER/CER = 28.11%/5.23%
[NeMo I 2022-01-02 16:38:10 eval_beamsearch_ngram:314] ==============================Starting the beam search decoding===============================
[NeMo I 2022-01-02 16:38:10 eval_beamsearch_ngram:315] Grid search size: 1
[NeMo I 2022-01-02 16:38:10 eval_beamsearch_ngram:316] It may take some time...
[NeMo I 2022-01-02 16:38:10 eval_beamsearch_ngram:317] ==============================================================================================
Beam search decoding with width=64, alpha=0.0, beta=0.0: 100%|██████████████████████████| 28/28 [00:06<00:00,  4.60it/s]
[NeMo I 2022-01-02 16:38:16 eval_beamsearch_ngram:141] WER/CER with beam search decoding and N-gram model = 37.72%/8.10%
[NeMo I 2022-01-02 16:38:16 eval_beamsearch_ngram:152] Oracle WER/CER in candidates with perfect LM= 29.68%/6.69%
[NeMo I 2022-01-02 16:38:16 eval_beamsearch_ngram:157] =================================================================================

muntasir2000 · 2022-01-03T15:22:19Z

muntasir2000
Jan 3, 2022

Seems like your LM is not performing well. Maybe the data distribution of the LM training text is very different than the one you are trying to measure accuracy. You can check the perplexity of your LM on the test data.

4 replies

errorfixrepeat Jan 4, 2022
Author

That does seem to be the issue. Still strange the language model has an effect with 0 alpha and beta.

Code

import kenlm
model=kenlm.Model(kenlm_file) 
print("train", np.mean([model.perplexity(_) for _ in list(df.text)]))
print("dev", np.mean([model.perplexity(_) for _ in list(df2.text)]))

Output

train 59.720829666857504
dev 1473.8091657014618

VahidooX Jan 5, 2022
Collaborator

Does it happen with the original QuartzNet before you fine-tune it?

errorfixrepeat Jan 9, 2022
Author

Unfortunately my data isn't English so I don't think it would give anything meaningful.

I came across another project based on Deepspeech's decoder and found a similar issue.
parlance/ctcdecode#179

Slyne Jan 18, 2022

beta will also take effects in the final results. Beta works like a length penalty coefficient.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding KenLM degrades performance #3384

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Adding KenLM degrades performance #3384

Uh oh!

Uh oh!

errorfixrepeat Jan 2, 2022

Training command

Replies: 1 comment · 4 replies

Uh oh!

muntasir2000 Jan 3, 2022

Uh oh!

errorfixrepeat Jan 4, 2022 Author

Uh oh!

VahidooX Jan 5, 2022 Collaborator

Uh oh!

errorfixrepeat Jan 9, 2022 Author

Uh oh!

Slyne Jan 18, 2022

errorfixrepeat
Jan 2, 2022

Replies: 1 comment 4 replies

muntasir2000
Jan 3, 2022

errorfixrepeat Jan 4, 2022
Author

VahidooX Jan 5, 2022
Collaborator

errorfixrepeat Jan 9, 2022
Author