This is a containerized version of VoiceCraft created by jasonppy/VoiceCraft. This container project is a WIP and should be considered very alpha software at the moment.
Origin
VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on in-the-wild data including audiobooks, internet videos, and podcasts.
To clone or edit an unseen voice, VoiceCraft needs only a few seconds of reference.