-
I'm in the process of using the NeMo framework to fine-tune a pretrained english model for swedish.
The train and dev splits are tarred with following config for
The tokenizer used currently is a "SPE" tokenizer made with
The idea is to use the pretrained STT En Conformer-Transducer Large model and fine-tune it for Swedish. The way I've set up the training so far is:
I've taken the base config
The set-up is further done as:
Now to the actual questions: For instance; 1. Should the optimizer config be changed from the baseconfig? I'd highly appreciate any help & pointers. Br |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 8 replies
-
I'm tagging you @titu1994 as I've seen you give great answers on a lot of other similar questions, thanks in advance! |
Beta Was this translation helpful? Give feedback.
-
Hi, thanks for the incredibly detailed discussion ! So first off let's start of with some important considerations -
Now onto the setup code
It's not a full tutorial but it has most of the important steps.
It is simply too vast for 1 gpu. I would first do short experiments - maybe 5-10 epochs ~ 50,000-100,000 steps. That should be a reasonable baseline for training. As you'll see in the Hindi notebook, even 5000 steps can be the start of training - from 100% we're to 40% !
|
Beta Was this translation helpful? Give feedback.
Hi, thanks for the incredibly detailed discussion ! So first off let's start of with some important considerations -
You have Tokenizer, are referring to bpe config, but are manually inserting parts of config from character based models config. Please refer to the bpe / subword config only when using Tokenizers. Mixing the two will not work and may silently cause major issues.
validation set cannot be used with tarred dataset. It is also quite wasteful to have over 10 hours of validation data, cause you'll mostly cover the whole vocabulary with that much. Also, NeMo does not support Val and test data loaders being tarred datasets cause they drop samples which would make results incomp…