TTS spectrogram generator training using phonemes as input #3156

oytunturk · 2021-11-10T01:15:14Z

oytunturk
Nov 10, 2021

Is there an example that shows how to train TTS spectrogram generator models from scratch using phonemes as input? Preferably, Tacotron2 but other models may be OK, too. There is a config file for fastspeech2 that seems to support this but I'm not sure about the exact format of the json file mentioned there, i.e.:

mappings_file: ??? # JSON file with word->phone and phone->idx mappings

Does Tacotron2 training process support the same kind of mappings_file?

Thanks!

Answered by XuesongYang

Aug 12, 2022

Yes, you could use IPA symbols as input to train a mel-spectrogram generator, such as Fastpitch. Our recent German model mixes both chars and IPA symbols together, but you can definitely only use IPA symbols. Please see the tutorial here for further guidance: https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/Fastpitch_Training_GermanTTS.ipynb

View full answer

XuesongYang · 2022-08-12T01:26:09Z

XuesongYang
Aug 12, 2022
Collaborator

Yes, you could use IPA symbols as input to train a mel-spectrogram generator, such as Fastpitch. Our recent German model mixes both chars and IPA symbols together, but you can definitely only use IPA symbols. Please see the tutorial here for further guidance: https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/Fastpitch_Training_GermanTTS.ipynb

1 reply

XuesongYang Aug 12, 2022
Collaborator

Besides, we also added an IPA tokenizer for your information. You can also have a try: https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/tts/torch/tts_tokenizers.py#L390

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TTS spectrogram generator training using phonemes as input #3156

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

TTS spectrogram generator training using phonemes as input #3156

Uh oh!

Uh oh!

oytunturk Nov 10, 2021

Replies: 1 comment · 1 reply

Uh oh!

XuesongYang Aug 12, 2022 Collaborator

Uh oh!

XuesongYang Aug 12, 2022 Collaborator

oytunturk
Nov 10, 2021

Replies: 1 comment 1 reply

XuesongYang
Aug 12, 2022
Collaborator

XuesongYang Aug 12, 2022
Collaborator