Malicious URL Detection Model Neural Network Optimized by Genetic Algorithms made by @thechiranjeevvyas
SafeURL is a web application implementing a Multilayer Perceptron Neural Network optimized using genetic algorithms. Detect whether a domain name or URL is malicious by inputting a URL. For instance,
https://www.google.com -> SAFE
yourbittorrent.com/?q=anthony-hamilton-soulife -> MALICIOUS
The model sequence defined within genetic_algorithm_implementation.py
is as follows:
- Integrate CSV Dataset and Remove Unnecessary Columns
- Use SMOTE to Balance out Class Distribution in Dataset
- Split Dataset into Training and Testing Sets using 80:20 Ratio
- Initialize Multilayer Perception
- Utilize Adam Optimization and Binary Cross Entropy Loss Function
- Initialize Model Callback to Wait Until 0.1 Validation Loss
- Train Model with 10 Epochs and Batch Size of 256
- Verify Model Results using 10 Examples
- Run Each Model Iteration through a Genetic Algorithm
- Evaluate Fitness of Each Model by Referencing Accuracy
- Determine Best Model within Population
- Save the Best Model into a .h5 File Output
To build from source, you will Python3 and Pip installed.
cd webapp
pip install -r requirements.txt
streamlit run app.py
Visit localhost:8501
to see the web application
The Research_Notebooks
folder contains the Jupyter research notebooks for this project. Each notebook explores a unique aspect of the dataset.
Feature_Extraction_Notebook.ipynb
extracts pertinent information out of the malicious and benign URLs Kaggle dataset
Data_Visualization_Notebook.ipynb
provides relevant data visualizations of the features extracted in the feature extraction notebook
Training_Models_Notebook.ipynb
tests a couple of models to classify which one is best suited for detecting malicious domains
Genetic_Algorithm_Notebook.ipynb
experiments with genetic algorithms and applies it to a neural network
The webapp
folder contains all the necessary files to setup the web server for the application.
Once you execute streamlit run app.py
visit localhost:8501
in a browser to see the application.
The app.py
file contains all the relevant Streamlit web application code.
The model_generation.py
file contains the code to generate the classification NN without GA optimization.
The genetic_algorithm_implementation.py
file contains the code to generate the classification NN with GA optimization.
SafeURL is open to any contributions. Please fork the repository and make a pull request with the features or fixes you want to be implemented.