Skip to content

Sam-MR11/ai_human_detection_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 AI vs Human Text Detection

A machine learning-powered web application built with Streamlit to detect whether text was written by a human or generated by an AI (e.g., ChatGPT). Upload .txt, .docx, or .pdf files, or type text directly. Choose a classification model (SVM, Decision Tree, AdaBoost), and receive real-time predictions with confidence scores and visual explanations.


📁 Project Structure

ai_human_detection_project/
├── app.py # 🚀 Main Streamlit application
├── requirements.txt # 📦 Project dependencies
├── models/ # 🔍 Trained ML models and vectorizer
│ ├── Human_Vs_AI_Written_pipeline.pkl
│ ├── optimized_svm_model.pkl
│ ├── decision_tree_pipeline.pkl
│ ├── adaboost_pipeline.pkl
| ├── feature_selector.pkl
| ├── individual_svm_classifier.pkl
| ├── optimized_adaboost_model.pkl
| ├── optimized_dt_model.pkl
| ├── sol2_pipeline_tfidf_vectorizer.pkl
│ └── tfidf_vectorizer.pkl
├── data/ # 🧪 Raw and test datasets
│ ├── AI_vs_huam_train_dataset/
│ └── Final_test_data/
├── notebooks/ # 📓 Jupyter notebooks (model training & analysis)
│ └── Project_1.ipynb
├── sample_files/ # 📁 Test documents (.txt, .pdf, .docx)
│ ├── AI Generated.txt
│ ├── AI Generated.pdf
│ └── Human-Written.docx
└── README.md # 📘 Project documentation

💡 Project Features

  • 🧠 Trained and tuned 3 classifiers (SVM, Decision Tree, AdaBoost)
  • 🔍 Supports .txt, .docx, and .pdf input formats
  • 📊 Displays prediction probabilities and agreement analysis
  • 📈 Real-time visualizations (confidence, model comparison, word stats)
  • 💾 Option to download prediction reports
  • 📎 Uses TF-IDF vectorization with optimized linguistic features
  • 📁 Clean and modular ML pipeline

🔧 Installation Instructions

1. Clone the repository

git clone https://github.com/Sam120-ass/ai_human_detection_project.git
cd ai_human_detection_project

2. Create Virtual Environment

python -m venv venv        #"venv venv" or 'create a folder': Eg: python -m venv project1 (Creates a project1 folder)
source venv/bin/activate    # On Windows: venv\Scripts\activate

3. Install the dependencies

pip install -r requirements.txt

Open VS Code/IDE

Running the Streamlit App

streamlit run app.py

This will launch a local web server where you can:

  • Upload text files (.txt, .docx, .pdf)
  • Choose from the 3 classifiers (SVM, Decision Tree, AdaBoost)
  • View AI vs Human predictions with confidence scores
  • See visualizations and model comparison
  • Download prediction reports

Machine Learning Models

All models were trained using optimized parameters with GridSearchCV and evaluated using 5-fold stratified cross-validation.

Model	                  Accuracy	                Features Used	          Notes
SVM		                      >90%              TF-IDF(10,000 ngrams)       Best overall performance
Decision Tree			      ~75%                      TF-IDF (2000)       Fast and interpretable
AdaBoost			      ~82%                      TF-IDF (2000)       Robust to noise, ensemble-based

The models and the vectorizer are saved in the models/ folder using joblib.

Input File Support

The app supports:

  • Plain Text Files: .txt
  • Word Documents: .docx (via python-docx)
  • PDF Files: .pdf (via pdfplumber)

Dependencies

Minimal versions used in training:

pandas>=2.0.0
numpy>=1.26.0
scikit-learn>=1.4.0
matplotlib>=3.7.1
seaborn>=0.12.2
plotly>=5.15.0
joblib>=1.3.2
nltk
pdfplumber
python-docx
fpdf
streamlit
wordcloud

INSTALL ALL USING:

pip install -r requirements.txt

📊 Visualisations Included

- Prediction probability bars

- Model agreement/disagreement summary

- Word frequency cloud

- Feature importance (for tree-based models)

- Word count/sentence length stats

📁 Models Directory (/models)

Ensure these files are present in the /models folder before running the app:

- Human_Vs_AI_Written_pipeline.pkl
- decision_tree_pipeline.pkl
- adaboost_pipeline.pkl
- tfidf_vectorizer.pkl

📋 Report and Evaluation Highlights

- Accuracy, Precision, Recall, F1-score reported

- Confusion matrices & ROC curves plotted for all models

- Agreement rates between models calculated and visualized

- Final model selected based on best cross-validation and holdout performance

🧪 Testing Files

Located in sample_files/:

- AI Generated.txt

- Human-Written.docx

- AI Generated.pdf

Use these for demo and testing the app UI.

🛠 Design Decisions & Notes

🧩 Used Pipeline() to combine preprocessing, TF-IDF, and classifier into one object

✅ Ensured consistency by not mixing custom and pipeline preprocessing

🧪 All models were trained using the same TF-IDF vectorizer (2000/10000 features depending on model)

📁 Saved modular models for better control and debugging

📽 Demo Video

A demo video is added here showing:

  • Model selection and predictions

  • Uploading PDF/Word/Text documents

  • Agreement analysis and downloading reports

👨‍💻 Contributors

📜 License

This project is licensed for educational use.

🙋‍♀️ Questions?

Feel free to raise an Issue on the GitHub repo or contact the developer.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published