A machine learning-powered web application built with Streamlit to detect whether text was written by a human or generated by an AI (e.g., ChatGPT). Upload .txt
, .docx
, or .pdf
files, or type text directly. Choose a classification model (SVM, Decision Tree, AdaBoost), and receive real-time predictions with confidence scores and visual explanations.
ai_human_detection_project/
├── app.py # 🚀 Main Streamlit application
├── requirements.txt # 📦 Project dependencies
├── models/ # 🔍 Trained ML models and vectorizer
│ ├── Human_Vs_AI_Written_pipeline.pkl
│ ├── optimized_svm_model.pkl
│ ├── decision_tree_pipeline.pkl
│ ├── adaboost_pipeline.pkl
| ├── feature_selector.pkl
| ├── individual_svm_classifier.pkl
| ├── optimized_adaboost_model.pkl
| ├── optimized_dt_model.pkl
| ├── sol2_pipeline_tfidf_vectorizer.pkl
│ └── tfidf_vectorizer.pkl
├── data/ # 🧪 Raw and test datasets
│ ├── AI_vs_huam_train_dataset/
│ └── Final_test_data/
├── notebooks/ # 📓 Jupyter notebooks (model training & analysis)
│ └── Project_1.ipynb
├── sample_files/ # 📁 Test documents (.txt, .pdf, .docx)
│ ├── AI Generated.txt
│ ├── AI Generated.pdf
│ └── Human-Written.docx
└── README.md # 📘 Project documentation
- 🧠 Trained and tuned 3 classifiers (SVM, Decision Tree, AdaBoost)
- 🔍 Supports
.txt
,.docx
, and.pdf
input formats - 📊 Displays prediction probabilities and agreement analysis
- 📈 Real-time visualizations (confidence, model comparison, word stats)
- 💾 Option to download prediction reports
- 📎 Uses TF-IDF vectorization with optimized linguistic features
- 📁 Clean and modular ML pipeline
git clone https://github.com/Sam120-ass/ai_human_detection_project.git
cd ai_human_detection_project
python -m venv venv #"venv venv" or 'create a folder': Eg: python -m venv project1 (Creates a project1 folder)
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
Open VS Code/IDE
streamlit run app.py
This will launch a local web server where you can:
- Upload text files (.txt, .docx, .pdf)
- Choose from the 3 classifiers (SVM, Decision Tree, AdaBoost)
- View AI vs Human predictions with confidence scores
- See visualizations and model comparison
- Download prediction reports
All models were trained using optimized parameters with GridSearchCV and evaluated using 5-fold stratified cross-validation.
Model Accuracy Features Used Notes
SVM >90% TF-IDF(10,000 ngrams) Best overall performance
Decision Tree ~75% TF-IDF (2000) Fast and interpretable
AdaBoost ~82% TF-IDF (2000) Robust to noise, ensemble-based
The models and the vectorizer are saved in the models/ folder using joblib.
The app supports:
- Plain Text Files: .txt
- Word Documents: .docx (via python-docx)
- PDF Files: .pdf (via pdfplumber)
Minimal versions used in training:
pandas>=2.0.0
numpy>=1.26.0
scikit-learn>=1.4.0
matplotlib>=3.7.1
seaborn>=0.12.2
plotly>=5.15.0
joblib>=1.3.2
nltk
pdfplumber
python-docx
fpdf
streamlit
wordcloud
INSTALL ALL USING:
pip install -r requirements.txt
- Prediction probability bars
- Model agreement/disagreement summary
- Word frequency cloud
- Feature importance (for tree-based models)
- Word count/sentence length stats
Ensure these files are present in the /models folder before running the app:
- Human_Vs_AI_Written_pipeline.pkl
- decision_tree_pipeline.pkl
- adaboost_pipeline.pkl
- tfidf_vectorizer.pkl
- Accuracy, Precision, Recall, F1-score reported
- Confusion matrices & ROC curves plotted for all models
- Agreement rates between models calculated and visualized
- Final model selected based on best cross-validation and holdout performance
Located in sample_files/:
- AI Generated.txt
- Human-Written.docx
- AI Generated.pdf
Use these for demo and testing the app UI.
🧩 Used Pipeline() to combine preprocessing, TF-IDF, and classifier into one object
✅ Ensured consistency by not mixing custom and pipeline preprocessing
🧪 All models were trained using the same TF-IDF vectorizer (2000/10000 features depending on model)
📁 Saved modular models for better control and debugging
A demo video is added here showing:
-
Model selection and predictions
-
Uploading PDF/Word/Text documents
-
Agreement analysis and downloading reports
- Samaya Niraula - https://github.com/Sam-MR11
This project is licensed for educational use.
Feel free to raise an Issue on the GitHub repo or contact the developer.