How GPT is being used? #264
Replies: 1 comment 2 replies
-
Hey thanks for the comment. At some point I restructured the document and folded what used to be appendix III into appendix II without updating the text. I've fixed that now. I think the statement you are referring to is how I used the AR activations to improve the performance of the diffusion model. This is described in more detail here. Note that in this document, I use "GPT" to refer to the model architecture, not the model developed by OpenAI. No pre-trained text encoders were used with Tortoise. I experimented with this briefly but found it to be detrimental. This makes sense because TTS systems care more about the phonetic interpretation of each text character than the deeper meaning behind the words in a sentence. It would probably be beneficial to condition a model like Tortoise on both the character level embeddings and the output of, for example, T5. There's been some success with using this idea in research for txt2im models lately that shows it allows these types of models to spell better. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Thank you for the work you have done and sharing it with everyone! I have read through you paper draft and it was mentioned that you have used GPT and it was the biggest contribution. And you mentioned that you would elaborate on it in appendix III, but there was no such appendix. Could you please briefly tell how GPT was used? Did you use GPT embeddings as your text encoder outputs? Which GPT model did you use exactly?
Beta Was this translation helpful? Give feedback.
All reactions