Skip to content

Dissertation project exploring cyberattack detection. Implements Artificial Neural Networks on the HIKARI-2022 dataset, with extensive hyperparameter tuning, evaluation metrics, and variable importance analysis using DeepSHAP and PFI.

Notifications You must be signed in to change notification settings

rox-spi/B.Sc-Dissertation-Identifying-Cyberattacks-using-ML-Techniques

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Identifying Cyberattacks using Machine Learning Techniques

This repository contains the code and supporting material for the undergraduate dissertation titled "Identifying Cyberattacks using Machine Learning Techniques". The project applies Artificial Neural Networks (ANNs) and Variable Importance Techniques to a labelled cyberattack dataset HIKARI-2022, with a focus on classification performance and interpretability.


📚 Table of Contents


Project Structure

├── Logistic Regression/
│   ├── Logistic Regression Model.py
│   ├── Classification Reports
│   ├── Confusion Matrices
│   └── Metrics Summary
├── Parameter Optimisation/
│   ├── 0. Raw Data/
│   ├── 1. Preprocessing/
│   ├── 2-7. Hyperparameter Tuning Experiments/
│   ├── 8. Baseline vs Optimised Model/
│   ├── optimised_model.keras
├── Variable Importance/
│   ├── Variable Importance.py
│   ├── SHAP & PFI Graphs
│   └── Results Tables

External Data Files

Due to GitHub file size limitations, the following data files are hosted externally:


Dataset

This project uses the HIKARI-2022 dataset for evaluating cyberattack detection models. The dataset was proposed by:

Ferriyan, A., Thamrin, A. H., Takeda, K., & Murai, J. (2021).
Generating Network Intrusion Detection Dataset Based on Real and Encrypted Synthetic Attack Traffic.
Applied Sciences, 11(17), 7868. https://doi.org/10.3390/app11177868

The dataset is publicly available for download via Zenodo.


Models and Methods

  • Artificial Neural Networks (ANNs)

    • Built with Keras and TensorFlow
    • Extensive hyperparameter tuning (learning rate, layers, dropout, etc.) with outputs for each stage
    • Final model saved as optimised_model.keras
  • Logistic Regression

    • Used as a baseline model
    • Includes classification reports, confusion matrices, and metrics
  • Variable Importance

    • Analysed using:
      • Permutation Feature Importance (PFI)
      • SHAP values using DeepSHAP explainer

Output Highlights

  • Performance comparisons between baseline and optimised models
  • Evaluation metrics: F1-score, Precision, Accuracy, ROC-AUC, PR-AUC
  • Clear visualisations of feature contributions using DeepSHAP and PFI

Environment Setup

This project was developed with Python 3.11 using Spyder as the IDE of choice. Required packages include:

  • numpy, pandas, scikit-learn, tensorflow, keras, shap, imbalanced-learn, matplotlib, tqdm, h5py

About

Dissertation project exploring cyberattack detection. Implements Artificial Neural Networks on the HIKARI-2022 dataset, with extensive hyperparameter tuning, evaluation metrics, and variable importance analysis using DeepSHAP and PFI.

Topics

Resources

Stars

Watchers

Forks

Languages