A simple yet effective machine learning model that classifies emails as Spam or Not Spam (Ham) using Multinomial Naive Bayes and TF-IDF vectorization. Built using Python and scikit-learn.
This project is a machine learning model built to classify emails as Spam or Not Spam (Ham) using Multinomial Naive Bayes and TF-IDF vectorization. It serves as a basic example of text classification using NLP techniques.
- Preprocesses email text: lowercasing, punctuation removal, stopwords removal
- Converts text data into numerical format using TF-IDF
- Trains a Multinomial Naive Bayes classifier
- Predicts whether an email is spam or not
- Shows classification report & accuracy
The project uses a labeled dataset containing spam and ham messages.
Example:
| Label | Message |
|---|---|
| ham | "Hey, how are you?" |
| spam | "Congratulations! You won!" |
pandasscikit-learnmatplotlib(optional for visualization)
- Load the dataset
- Preprocess the text (tokenization, cleaning, etc.)
- Convert text to vectors using
TfidfVectorizer - Train the Multinomial Naive Bayes model
- Evaluate the model using accuracy & classification report
- Predict whether new messages are spam or not
-
Clone the repository:
git clone https://github.com/yourusername/email-spam-classifier.git cd email-spam-classifier