WER metric converges to 1.0 when applying Conformer-Transducer Model #4324

nghiahuynh-ai · 2022-06-03T12:28:51Z

nghiahuynh-ai
Jun 3, 2022

I get trouble in training conformer-transducer model. I try changing config of the net little by little, but it doesn't work. The WER metric always converges to 1.0 and the log of prediction, for instance, is:

[NeMo I 2022-06-03 12:19:00 rnnt_wer_bpe:232] reference :p i t t s b u r g h
[NeMo I 2022-06-03 12:19:00 rnnt_wer_bpe:233] predicted :
[NeMo I 2022-06-03 12:19:00 rnnt_wer_bpe:231]

It predicts nothing!
I use an4 datasets introduced in Tutorials. Here is my config for Conformer-Transducer Model (sub-word):

name: Conformer-Transducer-BPE
model:
sample_rate: 16000
compute_eval_loss: false
log_prediction: true
skip_nan_grad: false
model_defaults:
enc_hidden: ${model.encoder.d_model}
pred_hidden: 64
joint_hidden: 64

train_ds:
manifest_filepath: datasets/an4/train_manifest.json
sample_rate: ${model.sample_rate}
batch_size: 16
shuffle: true
num_workers: 8
pin_memory: true
use_start_end_token: false
trim_silence: false
max_duration: 16.7
min_duration: 0.1
is_tarred: false
tarred_audio_filepaths: null
shuffle_n: 2048
bucketing_strategy: synced_randomized
bucketing_batch_size: null

validation_ds:
manifest_filepath: datasets/an4/test_manifest.json
sample_rate: ${model.sample_rate}
batch_size: 16
shuffle: false
num_workers: 8
pin_memory: true
use_start_end_token: false

test_ds:
manifest_filepath: datasets/an4/test_manifest.json
sample_rate: ${model.sample_rate}
batch_size: 16
shuffle: false
num_workers: 8
pin_memory: true
use_start_end_token: false

tokenizer:
dir: tokenizers/tokenizer_spe_unigram_v32
type: bpe

preprocessor:
target: nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor
sample_rate: 16000
normalize: per_feature
window_size: 0.025
window_stride: 0.01
window: hann
features: 80
n_fft: 512
frame_splicing: 1
dither: 1.0e-05
pad_to: 0

spec_augment:
target: nemo.collections.asr.modules.SpectrogramAugmentation
freq_masks: 0
time_masks: 0
freq_width: 27
time_width: 0.05

encoder:
target: nemo.collections.asr.modules.ConformerEncoder
feat_in: ${model.preprocessor.features}
feat_out: -1
n_layers: 17
d_model: 512
subsampling: striding
subsampling_factor: 4
subsampling_conv_channels: -1
ff_expansion_factor: 4
self_attention_model: rel_pos
n_heads: 8
att_context_size:
- -1
- -1
xscaling: true
untie_biases: true
pos_emb_max_len: 5000
conv_kernel_size: 31
conv_norm_type: batch_norm
dropout: 0.1
dropout_emb: 0.0
dropout_att: 0.1

decoder:
target: nemo.collections.asr.modules.RNNTDecoder
normalization_mode: null
random_state_sampling: false
blank_as_pad: true
prednet:
pred_hidden: ${model.model_defaults.pred_hidden}
pred_rnn_layers: 1
t_max: null
dropout: 0.1

joint:
target: nemo.collections.asr.modules.RNNTJoint
log_softmax: null
preserve_memory: false
fuse_loss_wer: true
fused_batch_size: 16
jointnet:
joint_hidden: ${model.model_defaults.joint_hidden}
activation: relu
dropout: 0.1

decoding:
strategy: greedy_batch
greedy:
max_symbols: 30
beam:
beam_size: 2
return_best_hypothesis: false
score_norm: true
tsd_max_sym_exp: 50
alsd_max_target_len: 2.0

loss:
loss_name: default
warprnnt_numba_kwargs:
fastemit_lambda: 0.0
clamp: -1.0

variational_noise:
start_step: 0
std: 0.0

optim:
name: adamw
lr: 0.001
betas:
- 0.9
- 0.98
weight_decay: 0
sched:
name: NoamAnnealing
d_model: ${model.encoder.d_model}
warmup_steps: 10000
warmup_ratio: null
min_lr: 1.0e-06

trainer:
devices: -1
num_nodes: 1
max_epochs: 500
max_steps: null
val_check_interval: 1.0
accelerator: auto
strategy: ddp
accumulate_grad_batches: 1
gradient_clip_val: 0.0
precision: 32
log_every_n_steps: 10
progress_bar_refresh_rate: 10
resume_from_checkpoint: null
num_sanity_val_steps: 0
check_val_every_n_epoch: 1
sync_batchnorm: true
enable_checkpointing: false
logger: false

exp_manager:
exp_dir: null
name: ${name}
create_tensorboard_logger: true
create_checkpoint_callback: true
checkpoint_callback_params:
monitor: val_wer
mode: min
save_top_k: 5
always_save_nemo: true
resume_if_exists: false
resume_ignore_no_checkpoint: false
create_wandb_logger: false
wandb_logger_kwargs:
name: null
project: null

Please show me the key to solve this problem. Thanks.

titu1994 · 2022-06-04T22:30:22Z

titu1994
Jun 4, 2022
Maintainer

Your model is way too large for a toy dataset such as an4. Reduce it to 1 M or so params and try. Or use a pretrained checkpoint to use as initialization

1 reply

nghiahuynh-ai Jun 5, 2022
Author

Thank for nice advice

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WER metric converges to 1.0 when applying Conformer-Transducer Model #4324

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

WER metric converges to 1.0 when applying Conformer-Transducer Model #4324

Uh oh!

Uh oh!

nghiahuynh-ai Jun 3, 2022

Replies: 1 comment · 1 reply

Uh oh!

titu1994 Jun 4, 2022 Maintainer

Uh oh!

nghiahuynh-ai Jun 5, 2022 Author

nghiahuynh-ai
Jun 3, 2022

Replies: 1 comment 1 reply

titu1994
Jun 4, 2022
Maintainer

nghiahuynh-ai Jun 5, 2022
Author