📖 Turjuman: Your Smart Book Translation System - Locally and privately hosted 🌍

Welcome to Turjuman (ترجمان - Interpreter/Translator in Arabic)! 👋

Ever felt daunted by translating a massive book (like 500 pages and over 150,000 words!)? Turjuman is here to help! using LLMs to magaically translate large documents while trying smartly keep the original meaning and style intact. Currently turjuman supports Markdown .md and plain text .txt files, other formats such as PDF, DOCX, epub, html, subtitles comming soon.

✨ How Turjuman Works

Turjuman uses a smart pipeline powered by LangGraph 🦜🔗 with two translation modes:

🔄 Translation Modes

🧠 Deep Translation Mode (Default): The comprehensive workflow with terminology unification, critique, and revision steps for higher quality and consistency. Best for professional or publication-ready translations.
⚡ Quick Translation Mode: A streamlined workflow that bypasses terminology unification, critique, and revision steps for faster processing and lower token usage. Ideal for drafts or when speed is more important than perfect quality.

✂️ Smart Chunking Options

Turjuman offers four intelligent chunking strategies to optimize your translation process:

🧠 Smart Mode (Default)

This mode is great for markdown or technical documents. Intelligently identifies and preserves special elements like code blocks, images, URLs, and footnotes. It splits text into optimal chunks while keeping related content together and ensuring non-translatable elements remain intact.

Perfect for technical documents, programming tutorials, or content with mixed elements
Preserves formatting and structure while optimizing for translation quality
Automatically handles bullet points, inline code, and other complex formatting

📏 Line Mode

Splits text by line breaks, making each line a separate chunk. All chunks are considered translatable.

Ideal for poetry, lyrics, or content where line breaks have semantic meaning
Preserves the exact line structure of the original document
Simple and predictable chunking pattern

🔣 Symbol Mode

Divides text based on specific separator symbols (like periods, commas, or custom separators).

Great for content with specific delimiter patterns
Allows customization of separator symbols
Useful for specialized formats with unique separation needs

🎬 Subtitle Mode

Specially designed for .srt subtitle files, separating timing information (non-translatable) from content (translatable).

Perfect for subtitle translation projects
Preserves exact subtitle timing and formatting
Handles subtitle-specific formatting and structure

📋 Translation Pipeline

🚀 init_translation: Start the translation job
🧐 terminology_unification: Find and unify key terms, User can provide manual list of prefered glossary or dicitenary "word paires" this feature available in (Deep Mode only)
✂️ chunk_document: Split the book into chunks using one of the available chunking strategies
🌐 initial_translation: Translate chunks in parallel
🤔 critique_stage: Review translations, catch errors (Deep Mode only)
✨ final_translation: Refine translations (Deep Mode only)
📜 assemble_document: Stitch everything back together

📊 Translation Flow

flowchart TD
    A([🚀 init_translation<br><sub>Initialize translation state and configs</sub>]) --> Mode{Translation Mode?}
    
    %% Mode decision
    Mode -->|Quick Mode| C([✂️ chunk_document<br><sub>Split the book into manageable chunks</sub>])
    Mode -->|Deep Mode| AA{User Glossary?}
    
    %% Glossary path decision (Deep Mode only)
    AA -->|Yes| AB([📘 User Glossary<br><sub>Use provided glossary terms</sub>])
    AA -->|No| AC([🔍 Auto Extract<br><sub>Extract key terms from document</sub>])
    
    %% Both glossary paths lead to terminology unification
    AB --> B([🧐 terminology_unification<br><sub>Unify glossary, prepare context</sub>])
    AC --> B
    
    B --> C

    %% Chunking produces multiple chunks
    C --> D1([📦 Chunk 1])
    C --> D2([📦 Chunk 2])
    C --> D3([📦 Chunk N])

    %% Parallel translation workers
    D1 --> E1([🌐 initial_translation<br><sub>Translate chunk 1 in parallel</sub>])
    D2 --> E2([🌐 initial_translation<br><sub>Translate chunk 2 in parallel</sub>])
    D3 --> E3([🌐 initial_translation<br><sub>Translate chunk N in parallel</sub>])

    %% Mode-based path after translation
    E1 --> ModeAfter{Translation Mode?}
    E2 --> ModeAfter
    E3 --> ModeAfter
    
    %% Quick Mode path
    ModeAfter -->|Quick Mode| I([📜 assemble_document<br><sub>Merge all chunks into final output</sub>])
    
    %% Deep Mode path
    ModeAfter -->|Deep Mode| F([🤔 critique_stage<br><sub>Review translations, check quality and consistency</sub>])

    %% Decision after critique
    F --> |No critical errors| G([✨ final_translation<br><sub>Refine translations based on feedback</sub>])
    F --> |Critical error| H([🛑 End<br><sub>Stop translation due to errors</sub>])

    G --> I
    I --> J([🏁 Done<br><sub>Translation complete!</sub>])

    H --> J

🛠️ Setup & Installation using conda or venv (for development)

Prerequisites

Conda: Install Miniconda or Anaconda
API Keys: Get your API keys for OpenAI, Anthropic, etc.
Ollama: You can use Turjuman locally without paying for LLM by installing Ollama or any Local Inference server like LMstudio, vLLM, LLamaCPP ..etc, take alook at sample.env for details

Clone the Repository

git clone <your-repo-url>
cd turjuman-book-translator

Create Conda Environment or use python venv

conda create -n turjuman_env python=3.12 -y
conda activate turjuman_env

Install Dependencies

# Install all needed libs
pip install -r requirements.txt

Configure Environment Variables

cp sample.env.file .env
# Edit .env and add your API keys

Recommended LLM Models

Online: Gemini Flash/Pro
Local: Gemma3 / Aya / Mistral

Run Backend Server

uvicorn src.server:app --host 0.0.0.0 --port 8051 --reload

Run the Web UI

The application will now be accessible at http://localhost:8051.

🚀 Using Turjuman via integrated web UI

visit http://localhost:8051

Go to "Configuration" tab and create a new default LLM configurations (LLM provider / model / translation mode, etc.)
Save the configuration profile (optional: you can create multiple profiles and select one as the default)
Select "New Translation" then upload a file to translate or paste text
Modify the source and target language
Modify the "Accent and style" if needed (this option can make translation more funny, spicy or professional by default)
Start translation. After a few seconds, both logs and text chunks will update dynamically
After translation progress reaches 100%, you can view or download the translated file or text
You can change the theme from the top drop menu (7 themes available)
You can switch the view between chunk or full document to review the translated content chunk by chunk

🔄 Job Queue & History

Turjuman includes a robust job management system:

Track all translation jobs with detailed status information (completed, processing, pending, failed)
View comprehensive job details including languages, duration, and timestamps
Download completed translations directly from the history view
Access job-specific glossaries generated during translation
View detailed logs and progress information for each job

📚 Glossary Management

Create and manage custom glossaries to ensure consistent terminology:

Build custom glossary tables with source and target term pairs
Upload glossary files in JSON format
Add individual terms through the user interface
Set default glossaries for automatic use in translations
Download, edit, and delete glossaries as needed
Option for automatic terminology extraction during translation

⚙️ Configuration Management

Manage LLM settings and environment variables directly from the UI:

Configure multiple LLM providers and models
Select translation mode (Deep or Quick) for each configuration
Create and save different configuration profiles
Set default configurations for quick access
Securely manage environment variables (API keys, etc.)
Filter available models by keyword
Duplicate existing configurations for easy modification

BASH Script Client

A convenient command-line client script (translate_over_api_terminal.sh) is provided for interacting with the backend API.

Prerequisites: curl, jq

Getting Help:

The script includes detailed usage instructions. To view them, run:

./translate_over_api_terminal.sh --help

or

./translate_over_api_terminal.sh -h

Basic Usage:

The only required argument is the input file (-i or --input). Other options allow you to specify languages, provider, model, API URL, and output file path.

# Translate a file using default settings (English->Arabic, OpenAI provider, default model)
# Ensure OPENAI_API_KEY is set in .env if using openai
./translate_over_api_terminal.sh -i path/to/your/document.md

# Specify languages, provider, model, and save response to a specific file
./translate_over_api_terminal.sh \
  --input my_book.md \
  --output results/my_book_translated.json \
  --source english \
  --target french \
  --provider ollama \
  --model llama3

# Use a different API endpoint
./translate_over_api_terminal.sh -i chapter1.md -u http://192.168.1.100:8051

# List available models fetched from the backend API
./translate_over_api_terminal.sh --list-models

The script submits the job via the API. Since the API call is synchronous, the script waits for completion and saves the full JSON response (containing the final state and the translated document in output.final_document) to a file (default: <input_name>_<job_id>.json or the path specified with --output). It also provides the curl command to retrieve the final state again using the job ID.

🗺️ Future Plans

Support for PDF, DOCX, and other formats
Further enhancements to glossary and terminology management
Interactive editing and feedback loop
Advanced customization options for translation styles
Additional translation modes with different quality/speed tradeoffs
Batch processing capabilities for multiple documents

🤝 Contributing

Pull requests welcome! For major changes, open an issue first.

📄 License

MIT

Enjoy translating your books with Turjuman! 🚀📚🌍

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
docs		docs
frontend		frontend
src		src
unit_testing		unit_testing
.gitignore		.gitignore
README.md		README.md
prompts.yaml		prompts.yaml
requirements.txt		requirements.txt
sample.env.file		sample.env.file
translate_over_api_terminal.sh		translate_over_api_terminal.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📖 Turjuman: Your Smart Book Translation System - Locally and privately hosted 🌍

✨ How Turjuman Works

🔄 Translation Modes

✂️ Smart Chunking Options

🧠 Smart Mode (Default)

📏 Line Mode

🔣 Symbol Mode

🎬 Subtitle Mode

📋 Translation Pipeline

📊 Translation Flow

🛠️ Setup & Installation using conda or venv (for development)

🚀 Using Turjuman via integrated web UI

🔄 Job Queue & History

📚 Glossary Management

⚙️ Configuration Management

BASH Script Client

🗺️ Future Plans

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

abdallah-ali-abdallah/turjuman-book-translator

Folders and files

Latest commit

History

Repository files navigation

📖 Turjuman: Your Smart Book Translation System - Locally and privately hosted 🌍

✨ How Turjuman Works

🔄 Translation Modes

✂️ Smart Chunking Options

🧠 Smart Mode (Default)

📏 Line Mode

🔣 Symbol Mode

🎬 Subtitle Mode

📋 Translation Pipeline

📊 Translation Flow

🛠️ Setup & Installation using conda or venv (for development)

🚀 Using Turjuman via integrated web UI

🔄 Job Queue & History

📚 Glossary Management

⚙️ Configuration Management

BASH Script Client

🗺️ Future Plans

🤝 Contributing

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages