How to use talknet-aligner #4337
-
Hi, I am trying to explore the talknet-aligner model ASR-based text/audio aligner based on CTC-loss algorithm that was used to train TalkNet. I was wondering how to use this, because the link says to refer to TTS inference which talks about spectrogram generation. However this model is not suppose to generate spectrogram. Any hints on how to use this model will be very helpful. When I use the transcribe method on the model, it does give the phonetic transcription, but I don't find a way to use it for alignment of a given text. Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi, the TalkNet Aligner has been deprecated, but if you just want to play around with it, you can check out the "Extracting phoneme ground truth durations" section of this notebook (as of the 1.8.0 release; it's since been removed): https://github.com/NVIDIA/NeMo/blob/r1.8.0/tutorials/tts/TalkNet_Training.ipynb We'll be moving to the RadTTS Aligner and will upload a checkpoint in the near future. |
Beta Was this translation helpful? Give feedback.
Hi, the TalkNet Aligner has been deprecated, but if you just want to play around with it, you can check out the "Extracting phoneme ground truth durations" section of this notebook (as of the 1.8.0 release; it's since been removed): https://github.com/NVIDIA/NeMo/blob/r1.8.0/tutorials/tts/TalkNet_Training.ipynb
We'll be moving to the RadTTS Aligner and will upload a checkpoint in the near future.