This project is a simple yet effective spam detection system that utilises the naive Bayes classifier to classify emails as either spam or not spam. It utilizes text preprocessing techniques, TF-IDF vectorization, and machine learning to build a spam filter based on the popular spam.csv
dataset.
- Multinomial Naive Bayes is used for classification because it performs well with text data and discrete features (like word counts or frequencies).
-
Dataset:
spam.csv
-
Columns:
v1
: Label (spam/ham)v2
: Text message
-
Text cleaning (punctuation removal, stopwords removal)
-
Text stemming using PorterStemmer
-
TF-IDF Vectorisation
-
Train-Test Split
-
Model Training & Evaluation
-
Performance metrics:
- Accuracy
- Confusion Matrix
- Classification Report
pandas
numpy
string
nltk
sklearn
The model is evaluated using:
Accuracy Score
Confusion Matrix
Classification Report
-
Clone the repository:
git clone https://github.com/yourusername/Email-Spam-Detection.git cd Email-Spam-Detection
-
Install required libraries:
pip install -r requirements.txt
-
Run the Jupyter Notebook:
jupyter notebook Email_Spam_Detection_system_using_Naive_Bayes.ipynb
📁 Email-Spam-Detection/
│
├── Email_Spam_Detection_system_using_Naive_Bayes.ipynb
├── spam.csv
├── README.md
└── requirements.txt
Metric | Value |
---|---|
Accuracy | 97%+ |
Precision | High |
Recall | High |
- Add GUI or Web Interface (using Streamlit/Flask)
- Deploy the model using FastAPI or Flask
- Add more datasets and deep learning models
Asim Hanif GitHub | LinkedIn Software Engineering Student & ML Enthusiast