TTS spectrogram generator training using phonemes as input #3156
-
Is there an example that shows how to train TTS spectrogram generator models from scratch using phonemes as input? Preferably, Tacotron2 but other models may be OK, too. There is a config file for fastspeech2 that seems to support this but I'm not sure about the exact format of the json file mentioned there, i.e.: mappings_file: ??? # JSON file with word->phone and phone->idx mappings Does Tacotron2 training process support the same kind of mappings_file? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Yes, you could use IPA symbols as input to train a mel-spectrogram generator, such as Fastpitch. Our recent German model mixes both chars and IPA symbols together, but you can definitely only use IPA symbols. Please see the tutorial here for further guidance: https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/Fastpitch_Training_GermanTTS.ipynb |
Beta Was this translation helpful? Give feedback.
Yes, you could use IPA symbols as input to train a mel-spectrogram generator, such as Fastpitch. Our recent German model mixes both chars and IPA symbols together, but you can definitely only use IPA symbols. Please see the tutorial here for further guidance: https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/Fastpitch_Training_GermanTTS.ipynb