Skip to content

A MATLAB-based machine learning project that implements a Naive Bayes spam email classifier using the UCI Spambase dataset. Includes feature selection, model tuning, performance evaluation, and deployment-ready model export.

License

Notifications You must be signed in to change notification settings

victoropp/naive-bayes-spam-detection

Repository files navigation

Naive Bayes Spam Detection in MATLAB

This project implements a real-world spam email classifier using the Spambase dataset. It applies Naive Bayes classification with hyperparameter tuning, feature selection, bootstrap confidence intervals, and includes model comparison with Logistic Regression.


Key Features

  • Feature pruning (low variance & high correlation)
  • Hyperparameter tuning (kernel + width)
  • Stratified cross-validation
  • Evaluation metrics: Accuracy, Precision, Recall, F1, AUC
  • Bootstrap Confidence Interval
  • ROC and Precision-Recall curves
  • Model comparison with Logistic Regression
  • MATLAB deployable model (SpamClassifierModel.mat)

Dataset

  • Name: Spambase Dataset from UCI
  • Size: 4,601 emails (57 features + 1 target)
  • Target Classes: Spam (1), Non-Spam (0)

Folder Structure

naive-bayes-spam-detection/
├── NaiveBayes_SpamDetection.m       # Main MATLAB script
├── spambase.csv                     # Dataset file (not included in public repo)
├── outputs/                         # Output visualizations
│   ├── class_distribution.png
│   ├── hyperparameter_eval.png
│   ├── optimization_surface.png
│   ├── confusion_matrix.png
│   ├── roc_curve.png
│   └── precision_recall_curve.png
├── SpamClassifierModel.mat          # Saved deployable model
└── README.md                        # Project documentation


Results Summary

Metric Value
Cross-Validated Accuracy (Tuning) 85.07%
5-Fold CV Accuracy 84.85%
Test Set Accuracy 85.94%
Precision 0.83752
Recall 0.79742
F1-Score 0.81698
AUC 0.91582
Bootstrap CI (95%) [83.99%, 87.61%]
Logistic Regression Accuracy 87.68%

Visualizations

Class Distribution

Class Distribution

Hyperparameter Optimization

Hyperparameter Evaluation

Optimization Surface

Optimization Surface

Confusion Matrix

Confusion Matrix

ROC Curve

ROC Curve

Precision-Recall Curve

Precision-Recall


Getting Started

  1. Download the dataset from UCI Spambase
  2. Open NaiveBayes_SpamDetection.m in MATLAB
  3. Run the script — metrics and figures will be generated automatically

Model Deployment

A trained model is saved using MATLAB's saveLearnerForCoder for production deployment:

saveLearnerForCoder(finalModel, 'SpamClassifierModel')

Kaggle

Also available on Kagggle: Kaggle


Author

Victor Collins Oppon
MSc Data Science | Chartered Accountant
LinkedIn Profile
Building a bridge between finance and intelligent automation.

License

This project is licensed under the MIT License.
You are free to use, modify, and distribute this software for both personal and commercial use, provided proper attribution is given.

About

A MATLAB-based machine learning project that implements a Naive Bayes spam email classifier using the UCI Spambase dataset. Includes feature selection, model tuning, performance evaluation, and deployment-ready model export.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages