This repository contains a set of Python scripts designed to transform raw manga images and associated JSON data into engaging read-along videos. Inspired by children's voice books, the videos feature character speech bubbles that appear in sync with narration, providing an interactive experience for weebs. (yes I use ChatGPT to generate this)
- Image Processing: Use Magiv2 to get the transcript
- Bubble Chat Animation: Creates a dynamic speech bubble sequence corresponding to character dialogues.
- Video Creation: Converts the processed images into a video format using
img2mp4.py
, enabling smooth playback of the read-along experience. - Voice over: Uses a TTS model to read out the dialogue extracted from the images. The TTS engine converts the text into natural-sounding speech, which is then synchronized with the bubble animations.
main.py
: The main file to run the whole process, is very buggy. But you can check the pipeline theredemo-manga-read-along.ipynb
: Check the process step-by-step hererequirements.txt
: Lists the dependencies needed to run the scripts.
-
Setup Environment:
- Make sure you have Python installed.
- Create a virtual environment (optional but recommended) and install the required packages:
pip install -r requirements.txt
-
Prepare Your Data:
\src\config.py
: Contains all the paths you need to know, this project requires users to provide manga images, character images, and voice bank samples for voice cloning- Works best when the raw is in English and they are named in sequential order (e.g.,
01.jpg
,02.jpg
,03.jpg
), the character naming format should be:luffy_1.jpg
,nami_1.jpg
.
-
Process Images and create Video with voice:
- Check out the kaggle file.
-
View the Demo:
- A video demo showcasing the read-along feature can be found in the repository.
no_color_fullpage.mp4
color_panel.mp4
Feel free to fork the repository and submit pull requests if you have improvements or suggestions.
This project is licensed under the MIT License - see the LICENSE file for details.