IslamicTranslator is an automated solution for translating Hadiths into multiple languages using Google Large Language Models (LLM) like Gemini. This tool breaks the translation process into manageable steps, ensuring accuracy by handling incomplete translations and aggregating results properly.

- Batch Processing: Iteratively translates Hadiths in batches, allowing for efficient management of large datasets.
- Translation Validation: Handles cases where translations are incomplete by identifying and excluding corrupted data.
- Automated Error Handling: Automatically reprocesses missing or improperly translated Hadiths.
- Aggregation: Combines all successfully translated Hadiths into a single output file.
- Customizable: Configure batch sizes and translation steps to adapt to your needs.

- Load input Hadiths from a JSON file.
- Batch Translation:
- Translate Hadiths in batches (default = 20 Hadiths per batch).
- Save each batch separately.
- Exclude the last element of any batch if it appears incomplete (e.g., translation was cut off).
- Combine translated batches, excluding incomplete entries.
- Compare combined translations with the original file to identify untranslated Hadiths.
- Retranslate missing Hadiths:
- If less than 20 are missing, translate in batches of 5.
- Otherwise, process in batches of 20.
- Aggregate all translations into a single file.
The main goal of this project is to translate all key books of Hadith to multiple languages, starting with Japanese. Below is a task list for translating the Hadith collections.
- abudawud.json
- ahmed.json
- bukhari.json
- darimi.json
- ibnmajah.json
- malik.json
- muslim.json
- nasai.json
- tirmidhi.json
- nawawi40.json
- qudsi40.json
- shahwaliullah40.json
- Translate these books into different languages such as French, Spanish, German, and more.
- Expand support for additional books of Hadith.
To use IslamicTranslator, you need to install the following Python dependencies:
google.generativeai
json
argparse
Install dependencies via pip:
pip install google-generativeai
Argument | Short Flag | Description |
---|---|---|
--api-key |
-k |
Your Google Gemini API key. |
--input-file |
-i |
Path to the input JSON file of Hadiths. |
--target-language |
-t |
Target language for translation (e.g., Japanese, French, Spanish) |
The format of the input file should look like the following:
{
"hadiths": [
{
"id": 1,
"arabic": "ุนุฑุจู ุงููุต.",
"english": {
"narrator": "Narrated by Abu Huraira:",
"text": "The Messenger of Allah said..."
}
},
{
"id": 2,
"arabic": "ุนุฑุจู ุงููุต ูข.",
"english": "Another hadith text in English..."
}
]
}
If you have a json file with a different format, change the code to make it compatible.
Once your input file is ready, you can run the script as follows:
python hadith_translator.py --api-key "<YOUR_GEMINI_API_KEY>" --input-file "<YOUR_INPUT_JSON_FILE>" --target-language "language"
Example:
python hadith_translator.py --api-key "Api_key" --input-file "muslim.json" -t "japanese"
-
Batch Translations Directory: Translations are saved as individual JSON files in the
batch_translations
directory. Example:batch_translations/ โโโ batch_1.json โโโ batch_2.json โโโ batch_3.json
-
Final Aggregated Translation File: The final translated Hadiths are stored in
final_translations.json
. Example:[ { "id": 1, "japanese": "ๆฅๆฌ่ชใธใฎ็ฟป่จณ: ..." }, { "id": 2, "japanese": "ๅฅใฎใใใฃใผในๆฅๆฌ่ช็ฟป่จณ..." } ]
create_batch_prompt
: Generates prompts for the translation batches in a model-compatible format.parse_gemini_response
: Parses the translation responses from the Gemini API.combine_batches_except_last
: Combines the translated batches while ignoring the problematic last items in incomplete batches.find_missing_hadiths
: Identifies untranslated Hadiths by comparing the combined translations with the original input.process_all
: Orchestrates the entire workflow, from initial translation to final output.
- Error Handling: Automatically skips errors, prevents crashes, and continues with subsequent batches.
- Timeouts: Includes a
time.sleep(3)
to handle API rate limits.
.
โโโ requirements.txt # List of dependencies for the project.
โโโ read.me # Instructions and documentation.
โโโ code/ # Folder containing the main script.
โ โโโ hadith_translator.py # Main script for translating Hadiths.
โโโ Translated/ # Folder containing translations.
โโโ Hadiths/ # Subfolder for Hadiths translations.
โโโ Japanese/ # Subfolder for translations in Japanese.
โโโ [Translated books] # Files for the translated books.
- Support Additional Languages: Configure multiple target languages with user input.
- Enhanced Error Recovery: Improve handling of corrupted API responses.
- Custom Models: Allow users to choose from different Google LLM variants.
- Integration with Cloud Storage: Save and retrieve input/output files directly from cloud storage solutions like Google Drive or AWS S3.
This project is open-sourced under the MIT License. See the LICENSE file for details.
- Special thanks to the AhmedBaset : https://github.com/AhmedBaset/hadith-json for providing the english-arabic json files!
- Built using the Google Gemini API.
Contributions are welcome! If you find any bugs or have suggestions for improvement, feel free to open an issue or submit a pull request.