Neural Network-Based Phishing Detection

Overview

This repository contains a project, that focuses on exploring how well neural networks, specifically Long Short-Term Memory (LSTM) and Bidirectional Encoder Representations from Transformers (BERT), can detect phishing attempts in emails. The project compares these advanced neural networks with traditional machine learning methods like logistic regression to assess their effectiveness in cybersecurity.

Project Summary

Phishing attacks are a significant threat in the cybersecurity domain, targeting individuals and organizations by tricking them into revealing sensitive information. As phishing tactics evolve, traditional detection methods are becoming less effective. This project explores whether neural networks can offer a better solution for detecting phishing emails.

Key Components:

Data Source:
- Kaggle's "Phishing Email Detection" dataset, containing 18,650 emails categorized as "Safe" or "Phishing."
- Dataset Link
Models Used:
- Long Short-Term Memory (LSTM): A Recurrent Neural Network architecture designed to handle sequence data, particularly effective in text data processing.
- Bidirectional Encoder Representations from Transformers (BERT): A state-of-the-art language model based on transformer architecture.
- Logistic Regression: A simpler, traditional machine learning method used for baseline comparison.
Evaluation Metrics:
- The models were evaluated based on accuracy, with the LSTM model achieving a validation accuracy of 96.15%, and the BERT and logistic regression models achieving slightly higher accuracy at 96.32%.
Visual Analysis:
- Word clouds were generated to visualize common words in phishing and safe emails, providing insights into the language patterns used in each category.

Methodology

The project followed these key steps:

Data Preparation: The dataset was balanced by downsampling, and text preprocessing steps were applied including tokenization, removal of stop words, vectorization, and sequence padding.
Model Training: LSTM, BERT, and logistic regression models were trained and validated using the prepared dataset.
Performance Analysis: The models' accuracy and loss were monitored over multiple epochs, with adjustments made as needed to optimize performance.
Results & Conclusion: The neural networks demonstrated high accuracy in phishing detection, though the marginal improvement over traditional methods suggests the need for further exploration in model selection based on specific organizational needs.

Conclusion

The project highlights the potential of neural networks in enhancing phishing detection but also indicates that simpler machine learning models may still offer competitive performance depending on the application context. Continuous innovation in cybersecurity techniques remains essential as cyber threats evolve.

How to Use

Clone the repository:

git clone https://github.com/Engnhabib/Neural-Network-Based-Phishing-Detection.git

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
Mohamed Habib Agrebi Code.ipynb		Mohamed Habib Agrebi Code.ipynb
README.md		README.md
test.csv		test.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Neural Network-Based Phishing Detection

Overview

Project Summary

Key Components:

Methodology

Conclusion

How to Use

About

Uh oh!

Releases

Packages

Languages

License

Engmhabib/Neural-Network-Based-Phishing-Detection

Folders and files

Latest commit

History

Repository files navigation

Neural Network-Based Phishing Detection

Overview

Project Summary

Key Components:

Methodology

Conclusion

How to Use

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages