Skip to content

A simple text classifier in Python that uses the Naive Bayes model to classify e-mails as spam or ham.

Notifications You must be signed in to change notification settings

iiakshat/spam-mail-detection

Repository files navigation

title emoji colorFrom colorTo sdk sdk_version app_file
Spam Email Detection
💌
pink
blue
gradio
3.17.0
app.py

Email Spam and Phishing URL Detection

This project utilizes Naive Bayes classification to detect whether an email is spam or not, and XGBoost classification to determine if a URL within an email is phishing or legitimate.

Getting Started

Project Overview

The project consists of two main components:

  1. Email Spam Detection: This component employs Naive Bayes classification to classify emails as either spam or not spam based on their content features.

  2. Phishing URL Detection: This component uses XGBoost classification to identify whether URLs within emails are associated with phishing attempts or legitimate websites.

Prerequisites

Make sure you have Python 3.10 installed on your system. You can download it from

Requirements

Ensure you have the following dependencies installed. You can install them using pip install -r requirements.txt.

  • gunicorn==22.0.0
  • python-dateutil==2.8.2
  • gradio==4.32.1
  • gradio_client==0.17.0
  • requests==2.31.0
  • beautifulsoup4==4.12.3
  • googlesearch_python==1.2.4
  • urlextract==1.9.0
  • numpy==1.26.3
  • pandas==2.2.0
  • scikit-learn==1.5.0
  • urllib3==2.1.0
  • python-whois==0.9.4
  • xgboost==2.0.3
  • lxml==5.2.2

Setup and Installation

  1. Clone the repository:

    git clone https://github.com/your-username/email-spam-phishing-detection.git
    cd email-spam-phishing-detection
    
  2. Install dependencies:

    pip install -r requirements.txt```
    

Usage

  1. Data Preparation:

    • Ensure the datasets spam.csv and urldata.csv are available in the data/ directory.
  2. Model Training:

    • If necessary, modify and run the notebook.ipynb Jupyter notebook to train or fine-tune the machine learning models.
    • Trained models will be saved in the models/ directory.
  3. Run the Application:

Acknowledgements

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A simple text classifier in Python that uses the Naive Bayes model to classify e-mails as spam or ham.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published