Note: This project was prepared as the final project for the course Text Analysis And Natural Language Processing (Spring 2025) at Constructor University, as part of the Master's in Data Science for Society and Business program.
This project aims to classify whether an essay is written by a human or generated by an AI model using Natural Language Processing (NLP) and Machine Learning (ML) techniques. A balanced dataset of 10,000 essays was processed, vectorized using TF-IDF, and classified using five ML models. The best performance was achieved with a Support Vector Machine (SVM), reaching 98.3% accuracy and F1-score.
- Problem: Detect if an essay is AI-generated or written by a human
- Data:
- 5,000 human-written essays
- 5,000 AI-generated essays
- Techniques Used:
- Preprocessing: lowercasing, stopword removal, rare word filtering, lemmatization
- Feature Extraction: TF-IDF Vectorization
- Models: Logistic Regression, SVM, Random Forest, XGBoost, Decision Tree
- Evaluation: Accuracy, Precision, Recall, F1-Score, ROC AUC
Model | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
Logistic Regression | 0.974 | 0.974 | 0.974 | 0.974 |
SVM | 0.983 | 0.988 | 0.978 | 0.983 |
Random Forest | 0.969 | 0.981 | 0.956 | 0.968 |
XGBoost | 0.974 | 0.983 | 0.964 | 0.973 |
Decision Tree | 0.895 | 0.890 | 0.901 | 0.896 |
To understand misclassifications, I performed:
- Word Frequency Analysis
- Sentiment Polarity Analysis
- Average Sentence Length Comparison
Insights showed that more emotionally neutral and uniformly structured essays were more likely to be misclassified.
- Python, Pandas, NumPy
- Scikit-learn, XGBoost
- NLTK, TextBlob
- Seaborn, Matplotlib, WordCloud
This project demonstrates how classical ML models, particularly SVM with TF-IDF features, can effectively distinguish between AI-generated and human-written essays. It also highlights how linguistic characteristics like sentiment and sentence structure influence model predictions.