WER metric converges to 1.0 when applying Conformer-Transducer Model #4324
nghiahuynh-ai
started this conversation in
General
Replies: 1 comment 1 reply
-
Your model is way too large for a toy dataset such as an4. Reduce it to 1 M or so params and try. Or use a pretrained checkpoint to use as initialization |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I get trouble in training conformer-transducer model. I try changing config of the net little by little, but it doesn't work. The WER metric always converges to 1.0 and the log of prediction, for instance, is:
[NeMo I 2022-06-03 12:19:00 rnnt_wer_bpe:232] reference :p i t t s b u r g h
[NeMo I 2022-06-03 12:19:00 rnnt_wer_bpe:233] predicted :
[NeMo I 2022-06-03 12:19:00 rnnt_wer_bpe:231]
It predicts nothing!
I use an4 datasets introduced in Tutorials. Here is my config for Conformer-Transducer Model (sub-word):
name: Conformer-Transducer-BPE
model:
sample_rate: 16000
compute_eval_loss: false
log_prediction: true
skip_nan_grad: false
model_defaults:
enc_hidden: ${model.encoder.d_model}
pred_hidden: 64
joint_hidden: 64
train_ds:
manifest_filepath: datasets/an4/train_manifest.json
sample_rate: ${model.sample_rate}
batch_size: 16
shuffle: true
num_workers: 8
pin_memory: true
use_start_end_token: false
trim_silence: false
max_duration: 16.7
min_duration: 0.1
is_tarred: false
tarred_audio_filepaths: null
shuffle_n: 2048
bucketing_strategy: synced_randomized
bucketing_batch_size: null
validation_ds:
manifest_filepath: datasets/an4/test_manifest.json
sample_rate: ${model.sample_rate}
batch_size: 16
shuffle: false
num_workers: 8
pin_memory: true
use_start_end_token: false
test_ds:
manifest_filepath: datasets/an4/test_manifest.json
sample_rate: ${model.sample_rate}
batch_size: 16
shuffle: false
num_workers: 8
pin_memory: true
use_start_end_token: false
tokenizer:
dir: tokenizers/tokenizer_spe_unigram_v32
type: bpe
preprocessor:
target: nemo.collections.asr.modules.AudioToMelSpectrogramPreprocessor
sample_rate: 16000
normalize: per_feature
window_size: 0.025
window_stride: 0.01
window: hann
features: 80
n_fft: 512
frame_splicing: 1
dither: 1.0e-05
pad_to: 0
spec_augment:
target: nemo.collections.asr.modules.SpectrogramAugmentation
freq_masks: 0
time_masks: 0
freq_width: 27
time_width: 0.05
encoder:
target: nemo.collections.asr.modules.ConformerEncoder
feat_in: ${model.preprocessor.features}
feat_out: -1
n_layers: 17
d_model: 512
subsampling: striding
subsampling_factor: 4
subsampling_conv_channels: -1
ff_expansion_factor: 4
self_attention_model: rel_pos
n_heads: 8
att_context_size:
- -1
- -1
xscaling: true
untie_biases: true
pos_emb_max_len: 5000
conv_kernel_size: 31
conv_norm_type: batch_norm
dropout: 0.1
dropout_emb: 0.0
dropout_att: 0.1
decoder:
target: nemo.collections.asr.modules.RNNTDecoder
normalization_mode: null
random_state_sampling: false
blank_as_pad: true
prednet:
pred_hidden: ${model.model_defaults.pred_hidden}
pred_rnn_layers: 1
t_max: null
dropout: 0.1
joint:
target: nemo.collections.asr.modules.RNNTJoint
log_softmax: null
preserve_memory: false
fuse_loss_wer: true
fused_batch_size: 16
jointnet:
joint_hidden: ${model.model_defaults.joint_hidden}
activation: relu
dropout: 0.1
decoding:
strategy: greedy_batch
greedy:
max_symbols: 30
beam:
beam_size: 2
return_best_hypothesis: false
score_norm: true
tsd_max_sym_exp: 50
alsd_max_target_len: 2.0
loss:
loss_name: default
warprnnt_numba_kwargs:
fastemit_lambda: 0.0
clamp: -1.0
variational_noise:
start_step: 0
std: 0.0
optim:
name: adamw
lr: 0.001
betas:
- 0.9
- 0.98
weight_decay: 0
sched:
name: NoamAnnealing
d_model: ${model.encoder.d_model}
warmup_steps: 10000
warmup_ratio: null
min_lr: 1.0e-06
trainer:
devices: -1
num_nodes: 1
max_epochs: 500
max_steps: null
val_check_interval: 1.0
accelerator: auto
strategy: ddp
accumulate_grad_batches: 1
gradient_clip_val: 0.0
precision: 32
log_every_n_steps: 10
progress_bar_refresh_rate: 10
resume_from_checkpoint: null
num_sanity_val_steps: 0
check_val_every_n_epoch: 1
sync_batchnorm: true
enable_checkpointing: false
logger: false
exp_manager:
exp_dir: null
name: ${name}
create_tensorboard_logger: true
create_checkpoint_callback: true
checkpoint_callback_params:
monitor: val_wer
mode: min
save_top_k: 5
always_save_nemo: true
resume_if_exists: false
resume_ignore_no_checkpoint: false
create_wandb_logger: false
wandb_logger_kwargs:
name: null
project: null
Please show me the key to solve this problem. Thanks.
Beta Was this translation helpful? Give feedback.
All reactions