Fake Message Detector

A machine learning-based solution to detect fake/spam messages on social media platforms using LSTM and TF-IDF models.

Overview

This project implements a web application that can detect fake or spam messages using natural language processing and deep learning techniques. The system analyzes text content to determine whether it's legitimate or potentially spam/fake with high accuracy.

Features

Multiple Classification Models:
- LSTM neural network for sequence analysis
- TF-IDF + Logistic Regression for comparison
- Ensemble method combining both approaches
Interactive Web UI:
- Real-time message analysis
- Prediction confidence scores
- History tracking of previous detections
- Statistics dashboard
Text Processing:
- Advanced NLP preprocessing
- URL, emoji, and special character handling
- Stop word removal and lemmatization

Project Structure

├── data/                    # Dataset files
│   ├── twitter_spam_data.csv    # Raw dataset
│   └── processed_twitter_spam.csv # Preprocessed dataset
├── models/                  # Trained model files
│   ├── lstm_model.h5        # LSTM neural network model
│   ├── tokenizer.pickle     # Text tokenizer
│   ├── tfidf_vectorizer.pickle # TF-IDF vectorizer
│   └── lr_model.pickle      # Logistic regression model
├── src/                     # Source code
│   ├── preprocessing.py     # Text cleaning and preprocessing
│   ├── main.py              # Model training script
│   ├── predict.py           # Prediction functionality
│   └── data_helper.py       # Dataset utilities
├── static/                  # Web assets
│   ├── css/                 # Stylesheets
│   └── js/                  # JavaScript files
├── templates/               # HTML templates
├── run_flask.py             # Flask web application
├── run.py                   # Main runner script with CLI
└── README.md                # Project documentation

Installation

# Clone the repository
git clone https://github.com/BaverYldz/SpamHandler-DL.git
cd SpamHandler-DL

# Install Git LFS (if not installed)
# For Windows: https://git-lfs.github.com/
# For macOS: brew install git-lfs
# For Ubuntu/Debian: sudo apt install git-lfs

# Set up Git LFS
git lfs install

# Create virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Dataset

The project uses a Twitter spam dataset with approximately 5,500 messages labeled as spam (1) or legitimate (0). The dataset includes various features such as:

Message text
Class label (spam/legitimate)

Usage

1. Data Preparation

python src/data_helper.py

2. Training Models

python src/main.py

3. Running the Web Application

python run_flask.py

Then open your browser and navigate to: http://localhost:5000

4. Using the CLI Runner

python run.py

Choose from the interactive menu options to prepare data, train models, or launch the web app.

Model Performance

Model	Accuracy	Precision	Recall	F1-Score
LSTM	87%	84%	81%	82.5%
TF-IDF + LR	85%	82%	80%	81%
Ensemble	88%	85%	82%	83.5%

Future Improvements

Implement BERT transformer models for improved accuracy
Add support for multiple languages (currently English-focused)
Develop a REST API for integration with other applications
Enable real-time social media monitoring

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
static		static
templates		templates
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
create_sample_data.py		create_sample_data.py
requirements.txt		requirements.txt
run.py		run.py
run_flask.py		run_flask.py
run_single_step.py		run_single_step.py
stats.json		stats.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fake Message Detector

Overview

Features

Project Structure

Installation

Dataset

Usage

1. Data Preparation

2. Training Models

3. Running the Web Application

4. Using the CLI Runner

Model Performance

Future Improvements

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

BaverYldz/SpamHandler-DL

Folders and files

Latest commit

History

Repository files navigation

Fake Message Detector

Overview

Features

Project Structure

Installation

Dataset

Usage

1. Data Preparation

2. Training Models

3. Running the Web Application

4. Using the CLI Runner

Model Performance

Future Improvements

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages