Welcome to Turjuman (ترجمان - Interpreter/Translator in Arabic)! 👋
Ever felt daunted by translating a massive book (like 500 pages and over 150,000 words!)? Turjuman is here to help! (currently Markdown .md
and plain text .txt
files) using LLMs to magaically translate large documents while trying smartly keep the original meaning and style intact.
Turjuman uses a smart pipeline powered by LangGraph 🦜🔗:
- 🚀 init_translation: Start the translation job
- 🧐 terminology_unification: Find and unify key terms
- ✂️ chunk_document: Split the book into chunks
- 🌐 initial_translation: Translate chunks in parallel
- 🤔 critique_stage: Review translations, catch errors
- ✨ final_translation: Refine translations
- 📜 assemble_document: Stitch everything back together
flowchart TD
A([🚀 init_translation<br><sub>Initialize translation state and configs</sub>]) --> B([🧐 terminology_unification<br><sub>Extract key terms, unify glossary, prepare context</sub>])
B --> C([✂️ chunk_document<br><sub>Split the book into manageable chunks</sub>])
%% Chunking produces multiple chunks
C --> D1([📦 Chunk 1])
C --> D2([📦 Chunk 2])
C --> D3([📦 Chunk N])
%% Parallel translation workers
D1 --> E1([🌐 initial_translation<br><sub>Translate chunk 1 in parallel</sub>])
D2 --> E2([🌐 initial_translation<br><sub>Translate chunk 2 in parallel</sub>])
D3 --> E3([🌐 initial_translation<br><sub>Translate chunk N in parallel</sub>])
%% Merge all translations
E1 --> F([🤔 critique_stage<br><sub>Review translations, check quality and consistency</sub>])
E2 --> F
E3 --> F
%% Decision after critique
F --> |No critical errors| G([✨ final_translation<br><sub>Refine translations based on feedback</sub>])
F --> |Critical error| H([🛑 End<br><sub>Stop translation due to errors</sub>])
G --> I([📜 assemble_document<br><sub>Merge all refined chunks into final output</sub>])
I --> J([🏁 Done<br><sub>Translation complete!</sub>])
H --> J
- Prerequisites
- Conda: Install Miniconda or Anaconda
- API Keys: Get your API keys for OpenAI, Anthropic, etc.
- Ollama: You can use Turjuman locally without paying for LLM by installing Ollama or any Local Inference server like LMstudio, vLLM, LLamaCPP ..etc, take alook at sample.env for details
Recommended Models
- Online: Gemini Flash/Pro
- Local: Gemma3 / Aya / Mistral
- Clone the Repository
git clone <your-repo-url>
cd turjuman-book-translator
- Create Conda Environment
conda create -n turjuman_env python=3.12 -y
conda activate turjuman_env
- Install Dependencies
pip install langchain langgraph langchain-openai langchain-anthropic langchain-google-genai langchain-community tiktoken python-dotenv markdown-it-py pydantic "langserve[server]" sse-starlette aiosqlite uv streamlit
- Configure Environment Variables
cp sample.env.file .env
# Edit .env and add your API keys
- Run Backend Server
uvicorn src.server:app --host 0.0.0.0 --port 8051 --reload
- Run Streamlit Frontend
streamlit run translate_over_api_frontend_streamlit.py
- Configure: Set API URL, source & target languages, provider, and model
- Upload: Your
.md
or.markdown
file - Start Translation: Click the button and watch the magic happen! ✨
- Review: See original and translated side-by-side, or chunk-by-chunk
- Download: Get your translated book or the full JSON response
A convenient command-line client script (translate_over_api_terminal.sh
) is provided for interacting with the backend API.
Prerequisites: curl
, jq
Getting Help:
The script includes detailed usage instructions. To view them, run:
./translate_over_api_terminal.sh --help
or
./translate_over_api_terminal.sh -h
Basic Usage:
The only required argument is the input file (-i
or --input
). Other options allow you to specify languages, provider, model, API URL, and output file path.
# Translate a file using default settings (English->Arabic, OpenAI provider, default model)
# Ensure OPENAI_API_KEY is set in .env if using openai
./translate_over_api_terminal.sh -i path/to/your/document.md
# Specify languages, provider, model, and save response to a specific file
./translate_over_api_terminal.sh \
--input my_book.md \
--output results/my_book_translated.json \
--source english \
--target french \
--provider ollama \
--model llama3
# Use a different API endpoint
./translate_over_api_terminal.sh -i chapter1.md -u http://192.168.1.100:8051
# List available models fetched from the backend API
./translate_over_api_terminal.sh --list-models
The script submits the job via the API. Since the API call is synchronous, the script waits for completion and saves the full JSON response (containing the final state and the translated document in output.final_document
) to a file (default: <input_name>_<job_id>.json
or the path specified with --output
). It also provides the curl
command to retrieve the final state again using the job ID.
- Support for PDF, DOCX, and other formats
- More advanced glossary and terminology management
- Interactive editing and feedback loop
- Better error handling and progress tracking
Pull requests welcome! For major changes, open an issue first.
MIT
Enjoy translating your books with Turjuman! 🚀📚🌍