Course: STT 811 – Applied Statistics Modeling for Data Scientists
Instructor: Savvy Barnes
Contributors: Andrew John J, Roshni Bhowmik, Mahnoor Sheikh, Ab Basit Syed Rafi
🌐 Streamlit App: STT811 Text Classification App
📁 GitHub Repo: STT811_StatsProject
With the increasing use of AI tools like ChatGPT in academia, distinguishing between human- and AI-generated responses is essential for maintaining academic integrity. This project explores a machine learning pipeline to classify text as human- or AI-generated based on linguistic and semantic features.
- Source: Custom dataset of 2,239 rows (from Mendeley)
- Contents:
Question
: The original statistics questionHuman Response
: Text response from a studentAI Response
: Text generated using a language model
- Post-cleaning: 1,993 usable examples
- Cleaning: Lowercasing, punctuation removal, tokenization, stopword removal
- Feature Creation:
- Text length, special character counts
- Flesch Reading Ease, Gunning Fog Index
- Cosine similarity to question
- Sentiment scores and sentiment gaps
- Vectorization:
CountVectorizer
followed by PCA (95% variance retained in 482 components)
Key visuals and insights:
- Top Trigrams and Common Words in AI vs. Human responses
- Word Clouds and Text Length Distribution
- Sentiment Gap Analysis and KDE Estimation
- Readability Scores: AI responses are longer and more formulaic
- Text Similarity: AI more aligned with original questions
- Pairplots & Correlation Heatmaps reveal subtle response patterns
- Logistic Regression, Linear SVM, Decision Tree, Random Forest, KNN, Gradient Boosting, MLP
- Best Accuracy: ~85% (Logistic Regression, SVM, MLP)
- Model:
bert-base-uncased
via Hugging Face - Training:
- Tokenization (WordPiece)
- 30 epochs with cross-entropy loss
- AdamW optimizer
- Performance: Comparable to traditional models with potential for further gains
- Upload new questions and responses
- Evaluate text using trained models
- Visual analytics: word clouds, trigrams, readability, sentiment
- Compare AI vs. human characteristics interactively
- Human responses were simpler, less verbose, and showed more variability
- AI responses were longer, sentimentally aligned with questions, and structurally consistent
- Readability, sentiment gap, and cosine similarity are strong distinguishing features
- The system offers a foundational step toward detecting AI-generated content in education
# Clone repo
git clone https://github.com/andrew-jxhn/STT811_StatsProject.git
cd STT811_StatsProject
# Create virtual environment (optional)
python -m venv venv
source venv/bin/activate # or .\venv\Scripts\activate on Windows
# Install dependencies
pip install -r requirements.txt
# Run Streamlit app
streamlit run streamlit_code.py