Training loss vs WER relation #5804

mehadi92 · 2023-01-15T11:26:51Z

mehadi92
Jan 15, 2023

Hi, I have trained two NeMo ASR model

1st model best training loss was 31.54 and I test it with the portion of training data and get WER 41%
2nd model best training loss was 25.05 and I also tested it with the same portion of training data and get WER 43%

According to my understanding the model for which the training loss is lower the WER on the training data should be lower for that model. But the case is reversed.

Could anyone give me any kind of a clue why the model behaves like this?
Please, let me know if you need any other information

Thanks

Answered by titu1994

Jan 15, 2023

For ASR, since we use heavy spec augment, lower loss does not correspond to lower wer always, since your mask can differ and the model is not trained for filling in text but to align sequences.

If a seq is masked the loss is not fully correlated with the wer.

View full answer

titu1994 · 2023-01-15T20:38:35Z

titu1994
Jan 15, 2023
Maintainer

For ASR, since we use heavy spec augment, lower loss does not correspond to lower wer always, since your mask can differ and the model is not trained for filling in text but to align sequences.

If a seq is masked the loss is not fully correlated with the wer.

0 replies

mehadi92 · 2023-01-16T03:52:14Z

mehadi92
Jan 16, 2023
Author

@titu1994, If we turn off the spec augmentation then the loss will be more correlated with the WER?

2 replies

titu1994 Jan 16, 2023
Maintainer

Not exactly. Sure your loss will go down as expected but your training loss is an alignment loss and your evaluation of wer is a distance based measure (edit distance to calculate insertion deletion and substitution).

What you are measuring with were is a proxy for what you are training your model to do - align sequences. It's a good proxy but eventually beyond a point your train loss may keep going down but your model will hit a limit to how much wer can be reduced.

This is due to issues of data labeling, audio quality etc. It's an irreducible error unless your dataset quality is near perfect.

mehadi92 Jan 16, 2023
Author

Thanks for your information

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Training loss vs WER relation #5804

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Training loss vs WER relation #5804

Uh oh!

Uh oh!

mehadi92 Jan 15, 2023

Replies: 2 comments · 2 replies

Uh oh!

titu1994 Jan 15, 2023 Maintainer

Uh oh!

mehadi92 Jan 16, 2023 Author

Uh oh!

Uh oh!

titu1994 Jan 16, 2023 Maintainer

Uh oh!

mehadi92 Jan 16, 2023 Author

mehadi92
Jan 15, 2023

Replies: 2 comments 2 replies

titu1994
Jan 15, 2023
Maintainer

mehadi92
Jan 16, 2023
Author

titu1994 Jan 16, 2023
Maintainer

mehadi92 Jan 16, 2023
Author