Distinguishing AI-Generated and Human-Written Essays using Text Mining Techniques

Note: This project was prepared as the final project for the course Text Analysis And Natural Language Processing (Spring 2025) at Constructor University, as part of the Master's in Data Science for Society and Business program.

This project aims to classify whether an essay is written by a human or generated by an AI model using Natural Language Processing (NLP) and Machine Learning (ML) techniques. A balanced dataset of 10,000 essays was processed, vectorized using TF-IDF, and classified using five ML models. The best performance was achieved with a Support Vector Machine (SVM), reaching 98.3% accuracy and F1-score.

📌 Project Overview

Problem: Detect if an essay is AI-generated or written by a human
Data:
- 5,000 human-written essays
- 5,000 AI-generated essays
Techniques Used:
- Preprocessing: lowercasing, stopword removal, rare word filtering, lemmatization
- Feature Extraction: TF-IDF Vectorization
- Models: Logistic Regression, SVM, Random Forest, XGBoost, Decision Tree
- Evaluation: Accuracy, Precision, Recall, F1-Score, ROC AUC

📈 Results

Model	Accuracy	Precision	Recall	F1 Score
Logistic Regression	0.974	0.974	0.974	0.974
SVM	0.983	0.988	0.978	0.983
Random Forest	0.969	0.981	0.956	0.968
XGBoost	0.974	0.983	0.964	0.973
Decision Tree	0.895	0.890	0.901	0.896

🔍 Error Analysis

To understand misclassifications, I performed:

Word Frequency Analysis
Sentiment Polarity Analysis
Average Sentence Length Comparison

Insights showed that more emotionally neutral and uniformly structured essays were more likely to be misclassified.

🧪 Libraries & Tools

Python, Pandas, NumPy
Scikit-learn, XGBoost
NLTK, TextBlob
Seaborn, Matplotlib, WordCloud

✅ Conclusion

This project demonstrates how classical ML models, particularly SVM with TF-IDF features, can effectively distinguish between AI-generated and human-written essays. It also highlights how linguistic characteristics like sentiment and sentence structure influence model predictions.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
NLP_Project_Notebook.ipynb		NLP_Project_Notebook.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Distinguishing AI-Generated and Human-Written Essays using Text Mining Techniques

📌 Project Overview

📈 Results

🔍 Error Analysis

🧪 Libraries & Tools

✅ Conclusion

About

Uh oh!

Releases

Packages

Languages

betulyurtman/AI-Generated-vs.-Human-Written-Essays

Folders and files

Latest commit

History

Repository files navigation

Distinguishing AI-Generated and Human-Written Essays using Text Mining Techniques

📌 Project Overview

📈 Results

🔍 Error Analysis

🧪 Libraries & Tools

✅ Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages