Skip to content

PATELOM925/Machine_Translation_for_Indian_Legal_Documents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Translation for Indian Legal Documents

📜 Overview

This project aims to bridge the gap in legal accessibility by creating a machine translation framework for Indian legal documents. Our solution simplifies and translates complex legal jargon into accessible and accurate text using state-of-the-art Natural Language Processing (NLP) models.

We also built a custom legal corpus of 2,500 terms, tailored specifically for Indian legal contexts, to enhance the accuracy of translation and summarization tasks.


🚀 Key Features

  • Legal Corpus Development:
    • Curated a dataset of 2,500 plus Indian legal terms with annotations for translations, simplified meanings, and contextual usage.
  • Machine Translation:
    • Implemented multilingual translation using models like MBART for Indian legal texts.
  • Legal Text Simplification:
    • Utilized Google Pegasus and T5-Base for summarizing complex legal clauses into concise and readable forms.
  • Custom Preprocessing:
    • Designed a pipeline to tokenize, substitute, and adapt legal terminology for better machine learning performance.
  • Evaluation Metrics:
    • Assessed translation and summarization accuracy using BLEU and ROUGE scores.

📂 Dataset

  • Name: Legal Corpus for Indian Legal Documents
  • Size: 2,500 plus terms
  • Features:
    • Legal term
    • Simplified meaning
    • Category (e.g., Contract Law, Criminal Law)
    • Multilingual translations
    • Example usage
  • Format: CSV, JSON
  • Usage: Dataset was preprocessed and used for training and validating NLP models like MBART and Google Pegasus.

🛠️ Technical Stack

  • Natural Language Processing (NLP):
    • MBART, Google Pegasus, T5-Base
    • LSTM, Transformer-based models
  • Programming Language: Python
  • Evaluation Metrics: BLEU, ROUGE
  • Tools: Pandas, NumPy, Scikit-learn, PyTorch, Hugging Face Transformers
  • Data Processing: Text preprocessing (tokenization, lemmatization, stopword removal)

💡 Impact

  • Simplifies complex legal documents for students, professionals, and the general public.
  • Provides accurate, multilingual legal translations for diverse Indian audiences.
  • Democratizes access to legal knowledge by reducing language and complexity barriers.

📊 Evaluation

  • BLEU Scores: Achieved high precision in multilingual translation tasks.
  • ROUGE Scores: Demonstrated accuracy and relevance in summarization outputs.

🏛 Acknowledgments

  • Co-Creator: Aman Patel
  • Mentor: Dr. Santosh Kumar Bharti
  • Institution: Pandit Deendayal Energy University
    Special thanks to everyone involved in supporting and guiding this project.

🤝 Collaboration

If you’re interested in further collaboration or discussing projects at the intersection of NLP, and Legal Technology, feel free to connect with us!


📬 Contact

For inquiries or further details, please reach out via email or LinkedIn:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published