Skip to content

๐ˆ๐ฌ๐ฅ๐š๐ฆ๐ข๐œ๐“๐ซ๐š๐ง๐ฌ๐ฅ๐š๐ญ๐จ๐ซ is an automated solution designed to translate ๐‡๐š๐๐ข๐ญ๐ก๐ฌ into multiple languages using the power of Large Language Models (LLMs). Driven by precision and accuracy, this tool ensures translations are robust by breaking the process into manageable steps and identifying corrupted or incomplete translations!

Notifications You must be signed in to change notification settings

a-hamdi/IslamicTranslator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

18 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

IslamicTranslator

IslamicTranslator is an automated solution for translating Hadiths into multiple languages using Google Large Language Models (LLM) like Gemini. This tool breaks the translation process into manageable steps, ensuring accuracy by handling incomplete translations and aggregating results properly.


Logo

Features

  • Batch Processing: Iteratively translates Hadiths in batches, allowing for efficient management of large datasets.
  • Translation Validation: Handles cases where translations are incomplete by identifying and excluding corrupted data.
  • Automated Error Handling: Automatically reprocesses missing or improperly translated Hadiths.
  • Aggregation: Combines all successfully translated Hadiths into a single output file.
  • Customizable: Configure batch sizes and translation steps to adapt to your needs.

Flowchart

How the Process Works

  1. Load input Hadiths from a JSON file.
  2. Batch Translation:
    • Translate Hadiths in batches (default = 20 Hadiths per batch).
    • Save each batch separately.
    • Exclude the last element of any batch if it appears incomplete (e.g., translation was cut off).
  3. Combine translated batches, excluding incomplete entries.
  4. Compare combined translations with the original file to identify untranslated Hadiths.
  5. Retranslate missing Hadiths:
    • If less than 20 are missing, translate in batches of 5.
    • Otherwise, process in batches of 20.
  6. Aggregate all translations into a single file.

Goals and Task List

The main goal of this project is to translate all key books of Hadith to multiple languages, starting with Japanese. Below is a task list for translating the Hadith collections.

9 Books of Hadith to translate:

  • abudawud.json
  • ahmed.json
  • bukhari.json
  • darimi.json
  • ibnmajah.json
  • malik.json
  • muslim.json
  • nasai.json
  • tirmidhi.json

Other Essential Books:

  • nawawi40.json
  • qudsi40.json
  • shahwaliullah40.json

Future Tasks:

  • Translate these books into different languages such as French, Spanish, German, and more.
  • Expand support for additional books of Hadith.

Dependencies

To use IslamicTranslator, you need to install the following Python dependencies:

  • google.generativeai
  • json
  • argparse

Install dependencies via pip:

pip install google-generativeai

Usage

Command Line Arguments

Argument Short Flag Description
--api-key -k Your Google Gemini API key.
--input-file -i Path to the input JSON file of Hadiths.
--target-language -t Target language for translation (e.g., Japanese, French, Spanish)

Example Input JSON File Format

The format of the input file should look like the following:

{
  "hadiths": [
    {
      "id": 1,
      "arabic": "ุนุฑุจูŠ ุงู„ู†ุต.",
      "english": {
        "narrator": "Narrated by Abu Huraira:",
        "text": "The Messenger of Allah said..."
      }
    },
    {
      "id": 2,
      "arabic": "ุนุฑุจูŠ ุงู„ู†ุต ูข.",
      "english": "Another hadith text in English..."
    }
  ]
}

If you have a json file with a different format, change the code to make it compatible.

Running the Script

Once your input file is ready, you can run the script as follows:

python hadith_translator.py --api-key "<YOUR_GEMINI_API_KEY>" --input-file "<YOUR_INPUT_JSON_FILE>" --target-language "language"

Example:

python hadith_translator.py --api-key "Api_key" --input-file "muslim.json" -t "japanese"

Output Files

  1. Batch Translations Directory: Translations are saved as individual JSON files in the batch_translations directory. Example:

    batch_translations/
    โ”œโ”€โ”€ batch_1.json
    โ”œโ”€โ”€ batch_2.json
    โ””โ”€โ”€ batch_3.json
    
  2. Final Aggregated Translation File: The final translated Hadiths are stored in final_translations.json. Example:

    [
      {
        "id": 1,
        "japanese": "ๆ—ฅๆœฌ่ชžใธใฎ็ฟป่จณ: ..."
      },
      {
        "id": 2,
        "japanese": "ๅˆฅใฎใƒใƒ‡ใ‚ฃใƒผใ‚นๆ—ฅๆœฌ่ชž็ฟป่จณ..."
      }
    ]

Highlights of the Code

Key Methods

  • create_batch_prompt: Generates prompts for the translation batches in a model-compatible format.
  • parse_gemini_response: Parses the translation responses from the Gemini API.
  • combine_batches_except_last: Combines the translated batches while ignoring the problematic last items in incomplete batches.
  • find_missing_hadiths: Identifies untranslated Hadiths by comparing the combined translations with the original input.
  • process_all: Orchestrates the entire workflow, from initial translation to final output.

Example Debugging and Issue Handling

  • Error Handling: Automatically skips errors, prevents crashes, and continues with subsequent batches.
  • Timeouts: Includes a time.sleep(3) to handle API rate limits.

Folder Structure

.
โ”œโ”€โ”€ requirements.txt          # List of dependencies for the project.
โ”œโ”€โ”€ read.me                   # Instructions and documentation.
โ”œโ”€โ”€ code/                     # Folder containing the main script.
โ”‚   โ””โ”€โ”€ hadith_translator.py  # Main script for translating Hadiths.
โ”œโ”€โ”€ Translated/               # Folder containing translations.
    โ””โ”€โ”€ Hadiths/              # Subfolder for Hadiths translations.
        โ””โ”€โ”€ Japanese/         # Subfolder for translations in Japanese.
            โ””โ”€โ”€ [Translated books]  # Files for the translated books.

Next Steps and Improvements

  • Support Additional Languages: Configure multiple target languages with user input.
  • Enhanced Error Recovery: Improve handling of corrupted API responses.
  • Custom Models: Allow users to choose from different Google LLM variants.
  • Integration with Cloud Storage: Save and retrieve input/output files directly from cloud storage solutions like Google Drive or AWS S3.

License

This project is open-sourced under the MIT License. See the LICENSE file for details.


Acknowledgements


Contributing

Contributions are welcome! If you find any bugs or have suggestions for improvement, feel free to open an issue or submit a pull request.

About

๐ˆ๐ฌ๐ฅ๐š๐ฆ๐ข๐œ๐“๐ซ๐š๐ง๐ฌ๐ฅ๐š๐ญ๐จ๐ซ is an automated solution designed to translate ๐‡๐š๐๐ข๐ญ๐ก๐ฌ into multiple languages using the power of Large Language Models (LLMs). Driven by precision and accuracy, this tool ensures translations are robust by breaking the process into manageable steps and identifying corrupted or incomplete translations!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages