This project aims to bridge the gap in legal accessibility by creating a machine translation framework for Indian legal documents. Our solution simplifies and translates complex legal jargon into accessible and accurate text using state-of-the-art Natural Language Processing (NLP) models.
We also built a custom legal corpus of 2,500 terms, tailored specifically for Indian legal contexts, to enhance the accuracy of translation and summarization tasks.
- Legal Corpus Development:
- Curated a dataset of 2,500 plus Indian legal terms with annotations for translations, simplified meanings, and contextual usage.
- Machine Translation:
- Implemented multilingual translation using models like MBART for Indian legal texts.
- Legal Text Simplification:
- Utilized Google Pegasus and T5-Base for summarizing complex legal clauses into concise and readable forms.
- Custom Preprocessing:
- Designed a pipeline to tokenize, substitute, and adapt legal terminology for better machine learning performance.
- Evaluation Metrics:
- Assessed translation and summarization accuracy using BLEU and ROUGE scores.
- Name: Legal Corpus for Indian Legal Documents
- Size: 2,500 plus terms
- Features:
- Legal term
- Simplified meaning
- Category (e.g., Contract Law, Criminal Law)
- Multilingual translations
- Example usage
- Format: CSV, JSON
- Usage: Dataset was preprocessed and used for training and validating NLP models like MBART and Google Pegasus.
- Natural Language Processing (NLP):
- MBART, Google Pegasus, T5-Base
- LSTM, Transformer-based models
- Programming Language: Python
- Evaluation Metrics: BLEU, ROUGE
- Tools: Pandas, NumPy, Scikit-learn, PyTorch, Hugging Face Transformers
- Data Processing: Text preprocessing (tokenization, lemmatization, stopword removal)
- Simplifies complex legal documents for students, professionals, and the general public.
- Provides accurate, multilingual legal translations for diverse Indian audiences.
- Democratizes access to legal knowledge by reducing language and complexity barriers.
- BLEU Scores: Achieved high precision in multilingual translation tasks.
- ROUGE Scores: Demonstrated accuracy and relevance in summarization outputs.
- Co-Creator: Aman Patel
- Mentor: Dr. Santosh Kumar Bharti
- Institution: Pandit Deendayal Energy University
Special thanks to everyone involved in supporting and guiding this project.
If you’re interested in further collaboration or discussing projects at the intersection of NLP, and Legal Technology, feel free to connect with us!
For inquiries or further details, please reach out via email or LinkedIn: