Skip to content

A machine learning project that classifies emails as spam or not spam using natural language processing (NLP) and deep learning techniques. It leverages LSTM neural networks built with TensorFlow and Keras, along with text preprocessing and tokenization, to enable accurate and real-time spam detection.

Notifications You must be signed in to change notification settings

HarsDev01/email-spam-classifier-lstm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“§ Email Spam Classifier using LSTM and NLP

A deep learning-based spam detection system that uses Natural Language Processing (NLP) techniques and a Long Short-Term Memory (LSTM) neural network to classify emails as spam or not spam. This project includes data visualization, preprocessing, model training, and a prediction interface for real-time email classification.


πŸ“Œ Features

  • πŸ“Š Exploratory Data Analysis with WordClouds and Seaborn plots
  • πŸ” Balanced dataset using downsampling for fair training
  • 🧹 Text preprocessing (punctuation removal, stopword filtering)
  • 🧠 LSTM-based deep learning model using Keras and TensorFlow
  • βœ… Real-time email spam prediction from user input
  • πŸ’Ύ Model and tokenizer saved for future predictions

🧰 Libraries Used

  • numpy, pandas, matplotlib, seaborn
  • nltk, wordcloud
  • scikit-learn
  • tensorflow, keras
  • pickle

πŸ“‚ Dataset

The project uses an emails.csv dataset with two columns:

  • text: Raw email content
  • spam: Binary label (0 = Not Spam, 1 = Spam)

πŸ§ͺ Model Architecture

  • Text tokenization and padding
  • Embedding Layer to learn word representations
  • LSTM Layer with 16 units to capture sequence dependencies
  • Dense layers with ReLU and Sigmoid activations
  • Loss: BinaryCrossentropy
  • Optimizer: Adam
  • EarlyStopping and ReduceLROnPlateau callbacks for better training control

πŸ“ˆ Performance

  • Training and validation accuracy visualized across epochs
  • Final test loss and accuracy printed after evaluation
  • Real-time predictions based on new email input from the user

πŸ’Ύ Output Artifacts

  • Trained model saved as: spam_detector_model.h5
  • Tokenizer saved as: tokenizer.pickle
  • You can reuse these for deployment or further predictions.

🧠 Sample Prediction

Enter the email text to check if it is spam: 
Congratulations! You've won a free iPhone! Click here to claim now.

The email is classified as: Spam

About

A machine learning project that classifies emails as spam or not spam using natural language processing (NLP) and deep learning techniques. It leverages LSTM neural networks built with TensorFlow and Keras, along with text preprocessing and tokenization, to enable accurate and real-time spam detection.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages