🤖 AI vs Human Text Detection

A machine learning-powered web application built with Streamlit to detect whether text was written by a human or generated by an AI (e.g., ChatGPT). Upload .txt, .docx, or .pdf files, or type text directly. Choose a classification model (SVM, Decision Tree, AdaBoost), and receive real-time predictions with confidence scores and visual explanations.

📁 Project Structure

ai_human_detection_project/
├── app.py # 🚀 Main Streamlit application
├── requirements.txt # 📦 Project dependencies
├── models/ # 🔍 Trained ML models and vectorizer
│ ├── Human_Vs_AI_Written_pipeline.pkl
│ ├── optimized_svm_model.pkl
│ ├── decision_tree_pipeline.pkl
│ ├── adaboost_pipeline.pkl
| ├── feature_selector.pkl
| ├── individual_svm_classifier.pkl
| ├── optimized_adaboost_model.pkl
| ├── optimized_dt_model.pkl
| ├── sol2_pipeline_tfidf_vectorizer.pkl
│ └── tfidf_vectorizer.pkl
├── data/ # 🧪 Raw and test datasets
│ ├── AI_vs_huam_train_dataset/
│ └── Final_test_data/
├── notebooks/ # 📓 Jupyter notebooks (model training & analysis)
│ └── Project_1.ipynb
├── sample_files/ # 📁 Test documents (.txt, .pdf, .docx)
│ ├── AI Generated.txt
│ ├── AI Generated.pdf
│ └── Human-Written.docx
└── README.md # 📘 Project documentation

💡 Project Features

🧠 Trained and tuned 3 classifiers (SVM, Decision Tree, AdaBoost)
🔍 Supports .txt, .docx, and .pdf input formats
📊 Displays prediction probabilities and agreement analysis
📈 Real-time visualizations (confidence, model comparison, word stats)
💾 Option to download prediction reports
📎 Uses TF-IDF vectorization with optimized linguistic features
📁 Clean and modular ML pipeline

🔧 Installation Instructions

1. Clone the repository

git clone https://github.com/Sam120-ass/ai_human_detection_project.git
cd ai_human_detection_project

2. Create Virtual Environment

python -m venv venv        #"venv venv" or 'create a folder': Eg: python -m venv project1 (Creates a project1 folder)
source venv/bin/activate    # On Windows: venv\Scripts\activate

3. Install the dependencies

pip install -r requirements.txt

Open VS Code/IDE

Running the Streamlit App

streamlit run app.py

This will launch a local web server where you can:

Upload text files (.txt, .docx, .pdf)
Choose from the 3 classifiers (SVM, Decision Tree, AdaBoost)
View AI vs Human predictions with confidence scores
See visualizations and model comparison
Download prediction reports

Machine Learning Models

All models were trained using optimized parameters with GridSearchCV and evaluated using 5-fold stratified cross-validation.

Model	                  Accuracy	                Features Used	          Notes
SVM		                      >90%              TF-IDF(10,000 ngrams)       Best overall performance
Decision Tree			      ~75%                      TF-IDF (2000)       Fast and interpretable
AdaBoost			      ~82%                      TF-IDF (2000)       Robust to noise, ensemble-based

The models and the vectorizer are saved in the models/ folder using joblib.

Input File Support

The app supports:

Plain Text Files: .txt
Word Documents: .docx (via python-docx)
PDF Files: .pdf (via pdfplumber)

Dependencies

Minimal versions used in training:

pandas>=2.0.0
numpy>=1.26.0
scikit-learn>=1.4.0
matplotlib>=3.7.1
seaborn>=0.12.2
plotly>=5.15.0
joblib>=1.3.2
nltk
pdfplumber
python-docx
fpdf
streamlit
wordcloud

INSTALL ALL USING:

pip install -r requirements.txt

📊 Visualisations Included

- Prediction probability bars

- Model agreement/disagreement summary

- Word frequency cloud

- Feature importance (for tree-based models)

- Word count/sentence length stats

📁 Models Directory (/models)

Ensure these files are present in the /models folder before running the app:

- Human_Vs_AI_Written_pipeline.pkl
- decision_tree_pipeline.pkl
- adaboost_pipeline.pkl
- tfidf_vectorizer.pkl

📋 Report and Evaluation Highlights

- Accuracy, Precision, Recall, F1-score reported

- Confusion matrices & ROC curves plotted for all models

- Agreement rates between models calculated and visualized

- Final model selected based on best cross-validation and holdout performance

🧪 Testing Files

Located in sample_files/:

- AI Generated.txt

- Human-Written.docx

- AI Generated.pdf

Use these for demo and testing the app UI.

🛠 Design Decisions & Notes

🧩 Used Pipeline() to combine preprocessing, TF-IDF, and classifier into one object

✅ Ensured consistency by not mixing custom and pipeline preprocessing

🧪 All models were trained using the same TF-IDF vectorizer (2000/10000 features depending on model)

📁 Saved modular models for better control and debugging

📽 Demo Video

A demo video is added here showing:

Model selection and predictions
Uploading PDF/Word/Text documents
Agreement analysis and downloading reports

LINK: https://youtu.be/1T7sPmoTf4E

👨‍💻 Contributors

Samaya Niraula - https://github.com/Sam-MR11

📜 License

This project is licensed for educational use.

🙋‍♀️ Questions?

Feel free to raise an Issue on the GitHub repo or contact the developer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🤖 AI vs Human Text Detection

📁 Project Structure

💡 Project Features

🔧 Installation Instructions

1. Clone the repository

2. Create Virtual Environment

3. Install the dependencies

Running the Streamlit App

Machine Learning Models

Input File Support

Dependencies

📊 Visualisations Included

📁 Models Directory (/models)

📋 Report and Evaluation Highlights

🧪 Testing Files

🛠 Design Decisions & Notes

📽 Demo Video

LINK: https://youtu.be/1T7sPmoTf4E

👨‍💻 Contributors

📜 License

🙋‍♀️ Questions?

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
models		models
notebooks		notebooks
samples		samples
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
requirements.txt.bak		requirements.txt.bak

Sam-MR11/ai_human_detection_project

Folders and files

Latest commit

History

Repository files navigation

🤖 AI vs Human Text Detection

📁 Project Structure

💡 Project Features

🔧 Installation Instructions

1. Clone the repository

2. Create Virtual Environment

3. Install the dependencies

Running the Streamlit App

Machine Learning Models

Input File Support

Dependencies

📊 Visualisations Included

📁 Models Directory (/models)

📋 Report and Evaluation Highlights

🧪 Testing Files

🛠 Design Decisions & Notes

📽 Demo Video

LINK: https://youtu.be/1T7sPmoTf4E

👨‍💻 Contributors

📜 License

🙋‍♀️ Questions?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages