GitHub - ananyapattaje/Web_page_phishing_detection: Machine Learning-based Phishing Detection using Python (Jupyter Notebook). This project analyzes URLs and website features to classify them as phishing or legitimate. Dataset from Kaggle

This project focuses on detecting phishing websites using machine learning models. The dataset is sourced from Kaggle's Web Page Phishing Detection Dataset. The analysis, model building, and evaluation were conducted in a Jupyter Notebook.

🪼 Dataset

Source: Web Page Phishing Detection Dataset on Kaggle
Rows: 11,430
Columns: 89
Target Variable: status (Labels: phishing, legitimate)
Balanced Dataset: 5,715 phishing and 5,715 legitimate samples

🪼 Exploratory Data Analysis (EDA)

Analyzed categorical and numerical features
Plotted histograms and correlation matrices
Found no missing values

🪼 Feature Selection (PCA)

Dropped non-relevant columns like url
Standardized data with StandardScaler
Applied PCA to select top 10 principal components
Final feature set reduced to 35 columns

🪼 Models & Evaluation

🐳 Models Trained

Model	Accuracy
Random Forest	95.5%
Logistic Regression	91.4%
SVM (RBF Kernel)	93.6%
KNN (k=5)	92.9%
Neural Network	94.3%

🐳 Metrics Used

Accuracy
Precision
Recall
F1-Score
Confusion Matrix

🐳 K-Fold Cross-Validation (5-Fold)

Random Forest consistently performed the best with ~95% accuracy.

🐳 Neural Network Architecture (Keras)

Dense(64, ReLU) → Dropout(0.3)
Dense(32, ReLU) → Dropout(0.2)
Output: Dense(1, Sigmoid)
Optimizer: Adam, Loss: Binary Crossentropy
Best Accuracy: ~94.6% (Validation), 94.3% (Test)

🪼 Outlier Handling & Retraining

Method	Records After	Best Accuracy
IQR	2,970	Slight drop
Z-Score	9,195	~94.4% (RF)

🪼 Interactive Dashboard (Dash + Plotly)

Features:

Histograms, Box Plots, Heatmaps
Class-based Feature Comparison
Model Performance Visualizations
Confusion Matrices
Neural Network Training Curves

🪼 Key Takeaways

PCA effectively reduced dimensionality.
Random Forest and Neural Network models performed best.
Dash-based dashboard enhances interpretability and presentation.
Outlier removal confirmed model robustness.

🪼 How to Run

Install libraries:

pip install pandas numpy matplotlib seaborn scikit-learn keras xgboost dash

Run the notebook PhishingDetection.ipynb.

## 🪼 Dashboard Overview

🪼 Author

Ananya P S

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
API for phishing		API for phishing
screenshots		screenshots
CyberSecurity_Phishing_Detction.ipynb		CyberSecurity_Phishing_Detction.ipynb
PhishingDetection_Data_Analysis.ipynb		PhishingDetection_Data_Analysis.ipynb
Phishing_Detection_Report.docx		Phishing_Detection_Report.docx
README.md		README.md
supertreechart1.svg		supertreechart1.svg
supertreechart2.svg		supertreechart2.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🪼 Dataset

🪼 Exploratory Data Analysis (EDA)

🪼 Feature Selection (PCA)

🪼 Models & Evaluation

🐳 Models Trained

🐳 Metrics Used

🐳 K-Fold Cross-Validation (5-Fold)

🐳 Neural Network Architecture (Keras)

🪼 Outlier Handling & Retraining

🪼 Interactive Dashboard (Dash + Plotly)

🪼 Key Takeaways

🪼 How to Run

🪼 Author

About

Uh oh!

Releases

Packages

Languages

ananyapattaje/Web_page_phishing_detection

Folders and files

Latest commit

History

Repository files navigation

🪼 Dataset

🪼 Exploratory Data Analysis (EDA)

🪼 Feature Selection (PCA)

🪼 Models & Evaluation

🐳 Models Trained

🐳 Metrics Used

🐳 K-Fold Cross-Validation (5-Fold)

🐳 Neural Network Architecture (Keras)

🪼 Outlier Handling & Retraining

🪼 Interactive Dashboard (Dash + Plotly)

🪼 Key Takeaways

🪼 How to Run

🪼 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages