A deep learning-based spam detection system that uses Natural Language Processing (NLP) techniques and a Long Short-Term Memory (LSTM) neural network to classify emails as spam or not spam. This project includes data visualization, preprocessing, model training, and a prediction interface for real-time email classification.
- π Exploratory Data Analysis with WordClouds and Seaborn plots
- π Balanced dataset using downsampling for fair training
- π§Ή Text preprocessing (punctuation removal, stopword filtering)
- π§ LSTM-based deep learning model using Keras and TensorFlow
- β Real-time email spam prediction from user input
- πΎ Model and tokenizer saved for future predictions
numpy
,pandas
,matplotlib
,seaborn
nltk
,wordcloud
scikit-learn
tensorflow
,keras
pickle
The project uses an emails.csv
dataset with two columns:
text
: Raw email contentspam
: Binary label (0 = Not Spam, 1 = Spam)
- Text tokenization and padding
Embedding
Layer to learn word representationsLSTM
Layer with 16 units to capture sequence dependencies- Dense layers with ReLU and Sigmoid activations
- Loss:
BinaryCrossentropy
- Optimizer:
Adam
- EarlyStopping and ReduceLROnPlateau callbacks for better training control
- Training and validation accuracy visualized across epochs
- Final test loss and accuracy printed after evaluation
- Real-time predictions based on new email input from the user
- Trained model saved as:
spam_detector_model.h5
- Tokenizer saved as:
tokenizer.pickle
- You can reuse these for deployment or further predictions.
Enter the email text to check if it is spam:
Congratulations! You've won a free iPhone! Click here to claim now.
The email is classified as: Spam