Feature Request: speech-2-(translated) speech #125

kwmedia · 2024-01-14T23:55:23Z

kwmedia
Jan 14, 2024

Hey there,
just tried out your transcription plugin. Was surprised how fast it was (running on a RTX3080).

I would love to see the option for a translated speech synthesis from the transcription based on the original audio input.

My usecase would be dubbing for YouTube's MultiLanguageAudio feature.

Thanks for working on this project!

Cheers!

RyanMetcalfeInt8 · 2024-01-15T00:48:02Z

RyanMetcalfeInt8
Jan 15, 2024
Maintainer

Hi @kwmedia,

Thanks for your interest, and trying out the project!

I would love to see the option for a translated speech synthesis from the transcription based on the original audio input.

Just to double check my understanding, you'd like to ability to do the following:

First, perform transcription on some chunk of audio, producing a label track. Then, given that label track, perform speech synthesis (text-to-speech), which produces a new audio track.

Do I have it right?

Thanks!
Ryan

0 replies

kwmedia · 2024-01-15T08:48:16Z

kwmedia
Jan 15, 2024
Author

Hey Ryan,
correct! Bonus points if the user can train the speech synthesis to sound like the audio input.
Not sure how much training data would be needed for that so maybe an option to have an adapted local model?
(We still have about 3000 Videos each ~10-30min, training a model on that should be possible, right?)
Cheers,
Martin

0 replies

RyanMetcalfeInt8 · 2024-01-16T01:19:59Z

RyanMetcalfeInt8
Jan 16, 2024
Maintainer

Thanks for the details..

I've been looking at porting this model / pipeline to work within our set of Audacity plugins: https://github.com/suno-ai/bark

It might take me a little while to enable, but I believe it would cover most of what you're looking to do. The only part that will (probably) be missing is the ability to perform 'voice cloning' -- looks like they intentionally prevent this feature, as it may cause more harm than good.

I'll post some updates here when I have some more details of what the timeline might look like for this feature.

Cheers!

0 replies

Elshara · 2024-03-25T18:02:23Z

Elshara
Mar 25, 2024

There are also these to look at for this idea.
https://replicate.com/collections/text-to-speech
As well as,
https://replicate.com/collections/audio-generation
and,
https://replicate.com/collections/transcribe-speech
with the ultimate goal being that you convert speech, into text, into style, and back to speech.
Or into music for that matter.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: speech-2-(translated) speech #125

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Feature Request: speech-2-(translated) speech #125

Uh oh!

kwmedia Jan 14, 2024

Replies: 4 comments

Uh oh!

RyanMetcalfeInt8 Jan 15, 2024 Maintainer

Uh oh!

kwmedia Jan 15, 2024 Author

Uh oh!

RyanMetcalfeInt8 Jan 16, 2024 Maintainer

Uh oh!

Elshara Mar 25, 2024

kwmedia
Jan 14, 2024

RyanMetcalfeInt8
Jan 15, 2024
Maintainer

kwmedia
Jan 15, 2024
Author

RyanMetcalfeInt8
Jan 16, 2024
Maintainer

Elshara
Mar 25, 2024