Finetuning a causal model from a non-causal checkpoint #14597
pointersbad
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello!
First off, thanks for the awesome work; NeMo is a great tool for research and setting up experiments and I enjoy using it very much.
I want to create a causal ASR model for Kyrgyz, and, because of the data scarcity, I thought it's a good idea to load the stt_kk_ru_fastconformer_hybrid_large, as it was trained on ~1k hours of Kazakh, which is very close to my target language. I have changed the config to make the model causal with limited lookahead, then loaded the weights where possible (only partially loaded the pre_encoder weights due to the size mismatch after non-causal -> causal convolutions conversion). After training the model on 32h of Kyrgyz data, I get the validation WER of around 35%, while the original model WER on MCV-test is ~15%.
Now I'm wondering: is it an expected outcome as the offline and streaming task differ and the attention layers can adapt just that much on 32 hours, or did I miss anything during my training? Has anyone conducted experiments like this and what was your experience?
Beta Was this translation helpful? Give feedback.
All reactions