Translate manga/comics speech bubbles using AI (YOLO for detection, LLMs for translation). Features a Gradio Web UI and CLI.
- Automatic speech bubble detection & segmentation.
- Text removal & cleaning from detected bubbles.
- Text extraction & translation via vision-capable LLMs.
- Renders translated text onto images with selected fonts.
- Web Interface (Gradio) & Command-Line Interface (CLI).
- Python >= 3.10
- YOLO model with mask segmentation (trained for speech bubbles)
- Vision-capable LLM (API or local)
Download the standalone zip (NVIDIA GPU or CPU) from the releases page.
Includes recommended YOLO model and Komika font pack.
-
Clone Repository:
git clone https://github.com/meangrinch/MangaTranslator.git cd MangaTranslator
-
Create Virtual Environment (Recommended):
# Create venv python -m venv venv # Activate (Windows CMD/PowerShell) .\venv\Scripts\activate # Activate (Linux/macOS/Git Bash) source venv/bin/activate
-
Install PyTorch:
# Example (CUDA 12.4) pip install torch==2.6.0+cu124 torchvision==0.21.0+cu124 --extra-index-url https://download.pytorch.org/whl/cu124 # Example (CPU) pip install torch
Refer to the official PyTorch installation guide for system-specific commands.
-
Install Dependencies:
pip install -r requirements.txt
-
Download YOLO Model:
- Download the recommended model and place it in the
models
directory.
- Download the recommended model and place it in the
-
Prepare Fonts:
- Place font folders (containing
.otf
/.ttf
files) insidefonts/
. - Font variants need 'italic' or 'bold' in their filename to be used for emphasis.
- Example structure:
fonts/ ├── CC Wild Words/ │ ├── CC Wild Words Roman.otf │ ├── CC Wild Words Italic.otf │ ├── CC Wild Words Bold.otf │ └── CC Wild Words Bold Italic.otf └── Another Font/ ├── AnotherFont-Regular.ttf └── AnotherFont-BoldItalic.ttf
Note: "CC Wild Words" is a common manga translation font.
- Place font folders (containing
-
Setup LLM:
- Supports external providers (Gemini, OpenAI, etc.,) and local models (Ollama, LMStudio, etc.,).
- Web UI: Configure in the "Config" tab (API keys saved locally to
config.json
). - CLI: Pass API keys/endpoints as arguments.
Note: Environment variables (e.g.,
GEMINI_API_KEY
) can also be used. See the "Config" tab for details.
Use start-webui.bat
or run python app.py --open-browser
Note: First launch will take longer to open (~1-2 minutes).
# Example (Single - Japanese -> English - Gemini):
python main.py --input <image_path> --yolo-model <model_path> --provider Gemini --gemini-api-key <key>
# Example (Batch - Custom Language - Ollama):
python main.py --input <folder_path> --batch --yolo-model <model_path> --font-dir <custom_font_dir> --input-language <custom_language> --output-language <custom_language> --provider OpenAI-Compatible --openai-compatible-url <url> --output <custom_output_folder>
# See all options:
python main.py --help
- Launch the Web UI.
- Use the "Translator" (single image) or "Batch" (multiple) tab.
- Upload manga/comic page image(s).
- Select Font, Source Language, Target Language.
- Go to "Config" tab:
- Set Translation -> LLM Provider, Model, API Key/Endpoint.
- Set Detection -> Reading Direction (rtl/ltr).
- Click "Save Config" (Optional).
- Return to the previous tab and click "Translate" / "Start Batch Translating".
- Output is saved to
./output/
by default.
Note: A "cleaning only" mode is also available in the "Other" sub-tab.
Navigate to the MangaTranslator
directory and run:
git pull
Place custom YOLO models (.pt
/.onnx
) in models/
(if using web UI). Must support segmentation and be trained for speech bubbles.
Apache-2.0. See LICENSE.