This project focuses on analyzing and correcting grammatical errors in text using various Natural Language Processing (NLP) techniques and machine learning models.
The project involves the following steps:
- Data Loading and Preprocessing: Loading a dataset of ungrammatical and corrected sentences, cleaning the data, and preparing it for analysis and model training.
- Exploratory Data Analysis (EDA): Performing EDA to understand the characteristics of the data, including error type frequencies, sentence length distributions, and common grammatical patterns.
- Model Training and Evaluation: Training different grammatical error correction (GEC) models, such as T5-based models, and evaluating their performance using metrics like BLEU.
- Error Analysis and Visualization: Analyzing common error patterns and visualizing them using techniques like word clouds and frequency distributions.
- Grammar Correction with Happy Transformer: Using the Happy Transformer library to apply the trained model for correcting grammar in new text.
The project uses the "Grammar Correction.csv" dataset, which contains pairs of ungrammatical and corrected sentences.
The following libraries are used in this project:
- pandas
- nltk
- matplotlib
- seaborn
- textstat
- Levenshtein
- textblob
- wordcloud
- transformers
- happytransformer
- optuna
- evaluate
- Clone the repository:
git clone <repository_url>
- Install the required libraries:
pip install -r requirements.txt
- Run the Jupyter Notebook:
jupyter notebook Grammatical_Error_Correction.ipynb
The project achieves promising results in grammatical error correction, with the best-performing model achieving a high BLEU score on the test set.
- Explore more advanced GEC models and techniques.
- Fine-tune models on larger and more diverse datasets.
- Develop a user-friendly interface for grammar correction.
Contributions are welcome! Please feel free to open issues or pull requests.