DDoS Detection with Decision Trees

Overview

This project implements a Decision Tree-based model using a RandomForestClassifier to detect Distributed Denial of Service (DDoS) attacks in network traffic data. The model distinguishes between DDoS and Benign network activities using a subset of features from a large dataset.

The notebook (Intrusion_Detection_Binary.ipynb) processes a combined dataset from public sources (CIC DoS, CIC-IDS 2017, and CSE-CIC-IDS 2018) to classify network traffic as either DDoS or Benign.

Dataset

Source: DDoS Balanced & Unbalanced Datasets on Kaggle.
Description: The dataset contains 84 features and over 12 million records, combining data from CIC DoS, CIC-IDS 2017, and CSE-CIC-IDS 2018. It focuses on binary classification (DDoS vs. Benign) rather than specific attack types.
Selected Features:
- Fwd Pkt Len Max
- TotLen Fwd Pkts
- ACK Flag Cnt
- SYN Flag Cnt
- Flow Duration
- Tot Fwd Pkts
- Bwd Pkt Len Max
- Tot Bwd Pkts
- RST Flag Cnt
- Fwd Pkts/s
- Init Fwd Win Byts
- Label (DDoS or Benign)

Prerequisites

To run the notebook, you need the following Python libraries:

pandas
scikit-learn (for RandomForestClassifier)
matplotlib (for visualization)

You can install them using:

pip install pandas scikit-learn matplotlib

Usage

Download the Dataset:
- Obtain the dataset from Kaggle.
Run the Notebook:
- Open Intrusion_Detection_Binary.ipynb in Jupyter Notebook or any compatible environment.
- Execute the cells sequentially to:
  - Load and preprocess the dataset (selecting relevant features and removing duplicates).
  - Train a RandomForestClassifier model.
  - Evaluate the model using a confusion matrix, classification report, and accuracy scores.
  - Analyze feature importance.
Key Steps in the Notebook:
- Data Loading: Reads the dataset with selected features to manage its large size (>12M records).
- Preprocessing: Removes duplicate entries (5,676,719 duplicates removed) and converts the Label column to numerical values (0 for Benign, 1 for DDoS).
- Model Training: Uses RandomForestClassifier for binary classification.
- Evaluation: Displays a confusion matrix, classification report (precision, recall, F1-score), and train/test accuracy.
- Feature Importance: Lists the importance of each feature in the model.

Results

Model Performance:
- Train Accuracy: ~99.63%
- Test Accuracy: ~99.63%
- Classification Report:
  - Precision: 0.99920 (Benign), 0.99290 (DDoS)
  - Recall: 0.99394 (Benign), 0.99906 (DDoS)
  - F1-Score: 0.99656 (Benign), 0.99597 (DDoS)
  - Overall Accuracy: 0.99629
- Confusion Matrix:
  - The confusion matrix evaluates the model's performance on the test set (1,423,582 samples):
    - True Negatives (Benign correctly classified): High accuracy in identifying Benign traffic (770,333 samples).
    - True Positives (DDoS correctly classified): High accuracy in identifying DDoS traffic (653,249 samples).
    - False Positives/Negatives: Minimal misclassifications, indicating robust model performance (specific counts not detailed in the notebook but reflected in high precision/recall).
  - The matrix is visualized using a heatmap with matplotlib, highlighting the model's ability to correctly classify both classes with minimal errors.
- Key Features: The most influential features include Fwd Pkt Len Max (21.86%), Tot Fwd Pkts (15.14%), and TotLen Fwd Pkts (13.88%).

File Structure

Intrusion_Detection_Binary.ipynb: The main Jupyter Notebook containing the code and analysis.
final_dataset.csv: The dataset file (not included in the repository; download from Kaggle).
README.md: This file, providing an overview and instructions.

How to Contribute

Fork the repository.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Dataset provided by Devendra416 on Kaggle.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DDoS Detection with Decision Trees

Overview

Dataset

Prerequisites

Usage

Results

File Structure

How to Contribute

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Intrusion_Detection_Binary.ipynb		Intrusion_Detection_Binary.ipynb
LICENSE		LICENSE
README.md		README.md

License

MahdiOsali/Binary-intrusion-detection

Folders and files

Latest commit

History

Repository files navigation

DDoS Detection with Decision Trees

Overview

Dataset

Prerequisites

Usage

Results

File Structure

How to Contribute

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages