Use Pre-Recorded Tracks as input/change active track #100

edrihan · 2024-11-26T21:59:54Z

It's really great how you can record in the command line, however there are some issues with that. Firstly, the transcription happens per-file which completely breaks the flow. Why can't it batch them once you're done recording? But that is neither here nor there.. as recording in the cli is just not what we are used to in the audio field. There are free (to use) programs like Reaper and Audacity, and paid DAWs which are enjoyed by people doing recordings.

So people have their own workflow for recording audio... and it's not that. You want control over what happens in your audio.

Description of proposed feature

I want to be able to swap audio files easily. I want to be able to pass an audio file, relative to a folder in manim/media/, for example:

manim/media/voiceovers
manim/media/recordings
manim/media/audio

I think the best way for this to work would probably be to have a small json file which has the section name corresponding with the file path. That way, a check of that file can be performed to determine what needs to be done for that section. Then if I update a section later, I want to be able to change manim links to the new file, and have the manim command automatically create transcription and metadata for it.

If a speech file doesn't exist, I want to be prompted whether I want to record it, or whether I want to supply a path. If I choose a path outside the manim directory, I want the option to copy the file into the right place. Maybe it adds the hash to the end of the filename too?

Now it can rebuild the data in manim/media/voiceovers/cache.json where it needs to

How can the new feature be used?

Ideally you just have the section text, as the key, and the filepath as the value.

Additional comments

The other feature I want is related to this, but I don't think it deserves its own issue as for the moment, using your own audio seems difficult, but if this feature request goes through, I would love to have the audio not be hardcoded to mp3. Normally in audio you would use WAV, FLAC, or equivalent lossless codecs. We can let the internet destroy the audio later, but on the master copy you don't use mp3s. I know that it made sense to use mp3s as that is probably what the ai models put out. But for our own audio we do not want lossy compression.

This will be more in line with normal audio workflows and will make this so much easier to use. Then we'll have more of peoples' voices and less of AI, which is a good thing.

Issue #88 seems related

The text was updated successfully, but these errors were encountered:

kennykb · 2025-05-27T15:02:36Z

It's in no fit shape to package as a module in manim_voiceover, but here's what I've been using.

It doesn't do WAV, FLAC or Ogg Vorbis - I wish it did, but haven't had the time to dive into how the audio handling works, so I've stuck to MP3.

# This is copied and pasted from my script that I copy and paste
# into all my animations, so I may have dropped an import or  three
from manim import *
from manim_voiceover import VoiceoverScene
from manim_voiceover.helper import msg_box, remove_bookmarks
from manim_voiceover.services.base import SpeechService
from tkinter import Tk, filedialog

class PrerecordedService(SpeechService):

    file_types = (
        ('MP3 Audio', '.mp3'),
        ('All Files', '*')
    )

    def __init__(
            self,
            transcription_model: str = "base",
            **kwargs):

        SpeechService.__init__(self,
                               transcription_model=transcription_model,
                               **kwargs)

    def generate_from_text(
            self, text:str, cache_dir:str = None, path: str=None, **kwargs
    ) -> dict:
        """"""

        # Remove bookmarks
        input_text = remove_bookmarks(text)

        if cache_dir is None:
            cache_dir = self.cache_dir

        input_data = {
            "input_text" : input_text,
            "service" : "prerecorded",
            }

        cached_result = self.get_cached_result(input_data, cache_dir)
        if cached_result is not None:
            return cached_result
        
        while path is None:
            box = msg_box("Voiceover:\n\n" + input_text)
            print(box)
            print('Please select an audio file for the text above.')
            path = filedialog.askopenfilename(
                title='Please select audio file',
                filetypes = self.file_types,
                )
            if path == ():
                raise RuntimeError('Operation cancelled.')
        print(f'Selected: {path}')

        json_dict = {
            "input_text" : text,
            "input_data" : input_data,
            "original_audio" : path,
        }

        return json_dict

and then I can say

speechService = PrerecordedService(transcription_model='small')

in place of GTTSService or RecorderService or whatever.

edrihan added the enhancement New feature or request label Nov 26, 2024

edrihan assigned osolmaz Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use Pre-Recorded Tracks as input/change active track #100

Use Pre-Recorded Tracks as input/change active track #100

edrihan commented Nov 26, 2024 •

edited

Loading

kennykb commented May 27, 2025

Uh oh!

Use Pre-Recorded Tracks as input/change active track #100

Use Pre-Recorded Tracks as input/change active track #100

Comments

edrihan commented Nov 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of proposed feature

How can the new feature be used?

Additional comments

kennykb commented May 27, 2025

Uh oh!

edrihan commented Nov 26, 2024 •

edited

Loading