A ML/DL solution using XAI to detect and classify network intrusions in real-time using packet flow data.
- Introduction
- Features
- Dataset
- Model Architecture
- Technologies Used
- Setup Instructions
- Challenges Faced
- Project Workflow
- Results
- Streamlit UI
- LICENSE
- Contributors
Network Intrusion Detection Systems (NIDS) are essential for monitoring and identifying malicious activities within a network. This project leverages ML and DL models to classify and predict network traffic in the cyber space.
- Binary Classification: Detects whether network traffic is benign or malicious.
- Multi-Class Classification: Classifies specific types of attacks.
- Explainable Decision Tree Classification: Provides interpretability for multi-class results.
- Real-Time Detection: Processes uploaded packet flow data in CSV format.
- Visualization: Generates confusion matrices, ROC-AUC curves, and spectrograms for insights.
This is the link to the dataset used for this project->CIC-IDS- 2017
The part of dataset used for our analysis includes network traffic data from:
- Friday Working Hours: Afternoon DDoS traffic.
- Thursday Working Hours: Morning Web Attacks.
- Benign
- PortScan (Attack)
- Web Attacks (Brute Force, XSS, SQL Injection)
- Logistic Regression
- Random Forest
- K-Nearest Neighbors (KNN)
- Support Vector Machine (SVM)
- Naive Bayes
- Gradient Boosting
- 1D CNN
- LSTM
- ANN
- Programming Languages: Python
- Frameworks: TensorFlow, Keras, Scikit-learn
- Libraries: Matplotlib, Seaborn, Librosa, Joblib
- Deployment Platform: Streamlit
- Cloud Services: AWS S3 for data and model storage, Sagemaker for model deployment
- Clone the repository:
git clone https://github.com/veydantkatyal/network-intrusion-detection-system-nus.git cd network-intrusion-detection-system-nus
- Add Pre-trained Models and Dataset
- Place .pkl files for ML models in the
models/
directory. - Place .h5 files for DL models in the
models/
directory. - Place CSV files in the
data/
directory.
- Start the Streamlit App
Run the following command to start the Streamlit app:
streamlit run app.py
-
Handling Imbalanced Dataset
- Initial datasets had a significant imbalance between benign and attack samples.
- Applied stratified sampling and downsampling techniques to achieve balanced classes.
-
Cleaning Large Datasets
- The datasets contained duplicates, null values, and infinite values.
- Ensured robust data preprocessing to handle these issues efficiently.
-
Feature Selection
- Identifying relevant features from a large number of columns was complex.
- Used correlation analysis and feature importance techniques to select key features.
-
Training Deep Learning Models
- Adjusting hyperparameters for optimal model performance required extensive experimentation.
- Computational resource constraints made training large models challenging.
-
Model Deployment
- Integrating both ML and DL models into a unified Streamlit app required careful design.
- Ensured smooth handling of multiple classification types (binary, multi-class, decision tree).
-
Explaining Model Predictions
- Providing feature-level interpretability for the decision tree was time-consuming but essential for actionable insights.
- Removed duplicates, null values, and infinite values.
- Scaled features using
StandardScaler
. - Encoded labels for binary and multi-class classification.
- Performed correlation analysis for feature selection.
- Applied stratified sampling for balanced class distribution.
- Trained ML models using stratified 80-20 train-test splits.
- Trained DL models using reshaped datasets with hyperparameter tuning.
- Evaluated models using:
- Accuracy
- Confusion Matrix
- Classification Report
- ROC-AUC Curves
- Integrated models into a Streamlit app with:
- Model selection options.
- Real-time detection capabilities.
- Best ML Model: Random Forest-96.7%
- Best DL Model: LSTM-98.52%
- Best ML Model: Random Forest-97.56%
- Best DL Model: LSTM-92.26%
- Best ML Model: Random Forest-98.78%
- Best DL Model: LSTM-97.18%
- Key Insights: Provides feature-level interpretability.
This project is licensed under->MIT License, please check it out before using this resource.