Machine Translation for Indian Legal Documents

📜 Overview

This project aims to bridge the gap in legal accessibility by creating a machine translation framework for Indian legal documents. Our solution simplifies and translates complex legal jargon into accessible and accurate text using state-of-the-art Natural Language Processing (NLP) models.

We also built a custom legal corpus of 2,500 terms, tailored specifically for Indian legal contexts, to enhance the accuracy of translation and summarization tasks.

🚀 Key Features

Legal Corpus Development:
- Curated a dataset of 2,500 plus Indian legal terms with annotations for translations, simplified meanings, and contextual usage.
Machine Translation:
- Implemented multilingual translation using models like MBART for Indian legal texts.
Legal Text Simplification:
- Utilized Google Pegasus and T5-Base for summarizing complex legal clauses into concise and readable forms.
Custom Preprocessing:
- Designed a pipeline to tokenize, substitute, and adapt legal terminology for better machine learning performance.
Evaluation Metrics:
- Assessed translation and summarization accuracy using BLEU and ROUGE scores.

📂 Dataset

Name: Legal Corpus for Indian Legal Documents
Size: 2,500 plus terms
Features:
- Legal term
- Simplified meaning
- Category (e.g., Contract Law, Criminal Law)
- Multilingual translations
- Example usage
Format: CSV, JSON
Usage: Dataset was preprocessed and used for training and validating NLP models like MBART and Google Pegasus.

🛠️ Technical Stack

Natural Language Processing (NLP):
- MBART, Google Pegasus, T5-Base
- LSTM, Transformer-based models
Programming Language: Python
Evaluation Metrics: BLEU, ROUGE
Tools: Pandas, NumPy, Scikit-learn, PyTorch, Hugging Face Transformers
Data Processing: Text preprocessing (tokenization, lemmatization, stopword removal)

💡 Impact

Simplifies complex legal documents for students, professionals, and the general public.
Provides accurate, multilingual legal translations for diverse Indian audiences.
Democratizes access to legal knowledge by reducing language and complexity barriers.

📊 Evaluation

BLEU Scores: Achieved high precision in multilingual translation tasks.
ROUGE Scores: Demonstrated accuracy and relevance in summarization outputs.

🏛 Acknowledgments

Co-Creator: Aman Patel
Mentor: Dr. Santosh Kumar Bharti
Institution: Pandit Deendayal Energy University
Special thanks to everyone involved in supporting and guiding this project.

🤝 Collaboration

If you’re interested in further collaboration or discussing projects at the intersection of NLP, and Legal Technology, feel free to connect with us!

📬 Contact

For inquiries or further details, please reach out via email or LinkedIn:

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
NLP_Indian_Legal_Docs.ipynb		NLP_Indian_Legal_Docs.ipynb
Poster.pdf		Poster.pdf
README.md		README.md
dataset.zip		dataset.zip
output_data.json		output_data.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Machine Translation for Indian Legal Documents

📜 Overview

🚀 Key Features

📂 Dataset

🛠️ Technical Stack

💡 Impact

📊 Evaluation

🏛 Acknowledgments

🤝 Collaboration

📬 Contact

About

Uh oh!

Releases

Packages

Languages

PATELOM925/Machine_Translation_for_Indian_Legal_Documents

Folders and files

Latest commit

History

Repository files navigation

Machine Translation for Indian Legal Documents

📜 Overview

🚀 Key Features

📂 Dataset

🛠️ Technical Stack

💡 Impact

📊 Evaluation

🏛 Acknowledgments

🤝 Collaboration

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages