Pred-Sus-Act

Project Overview

Pred-Sus-Act focuses on detecting network anomalies using machine learning techniques. It consists of three main parts:

Exploratory Data Analysis (EDA): Loading and preparing the dataset for analysis
Feature Selection: Using LASSO regression to identify the most important features for anomaly detection
Model Training: Implementing and evaluating various machine learning models for classification

Project Structure

1. LASSO Feature Selection (`LASSO_feature_selection.ipynb`)

Performs feature selection using LASSO regression
Identifies the most significant features for anomaly detection
Saves selected features to data/original/LASSO_selected_features.csv

Key steps:

Data loading and preprocessing
Feature engineering (numeric and categorical features)
LASSO regression with alpha parameter tuning
Feature importance visualization
Saving selected features

2. Exploratory Data Analysis (`EDA.ipynb`)

Loads the dataset and performs initial data exploration
Uses ANOVA for feature selection
Visualizes the distribution of features and target variable
Handles unbalanced classes
Applies dimensionality reduction techniques (PCA, t-SNE, UMAP) for visualization
Applies SMOTEN as best technique for oversampling the minority class
Visualizes the impact of different oversampling techniques on the dataset
Saves the processed dataset to data/resampled/ANOVA_selected_features.csv

3. Model Training (`Models.ipynb`)

Implements and evaluates four machine learning models:
1. Support Vector Machine (SVM) Classifier
2. Decision Tree Classifier
3. Random Forest Classifier
4. Logistic Regression
Includes an ensemble stacking classifier
Generates performance metrics and saves reports to SQLite database

Key components:

Data loading and train-test split
Model fine-tuning with hyperparameter optimization
Performance metric generation and visualization
Model comparison and ensemble learning

Key Features

Feature Selection

Uses ANOVA for feature selection
Uses LASSO regression for feature selection
Evaluates feature importance based on coefficients
Handles both numeric and categorical features
Visualizes the impact of different alpha values on model performance

Machine Learning Models

SVM Classifier
- Linear kernel implementation
- C parameter tuning for optimal recall
- Achieves high accuracy in anomaly detection
Decision Tree Classifier
- Uses entropy as splitting criterion
- Visualizes the decision tree structure
- Provides interpretable classification rules
Random Forest Classifier
- Ensemble of decision trees
- Uses entropy for node splitting
- Handles high-dimensional feature space effectively
Logistic Regression
- Linear classification model
- Regularization parameter (C) tuning
- Efficient for binary classification tasks
Stacking Ensemble
- Combines predictions from all base models
- Uses Random Forest as final estimator
- Potentially improves overall performance

Performance Evaluation

Implements comprehensive metric reporting:
- Precision, recall, and F1-score for each class
- Accuracy, macro and weighted averages
- Visualizations of model performance

Data Management

Uses SQLite database (models_reports.db) to store:
- Test metadata (model names, dataset versions)
- Model parameters
- Performance metrics

How to Use

Clone the repository and navigate to the project directory
Activate poetry environment
Install dependencies using poetry install
Run the Jupyter notebooks in order:
- LASSO_feature_selection.ipynb
- EDA.ipynb
- Models.ipynb

Requirements

Navigate to pyproject.toml for project dependencies

Future Improvements

Complete hyperparameter tuning for Random Forest
Experiment with additional models (e.g., Neural Networks)
Implement more sophisticated feature engineering
Add cross-validation for more robust evaluation

This project provides a comprehensive framework for network anomaly detection, from feature selection to model evaluation, with a focus on interpretability and performance.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.ipynb_checkpoints		.ipynb_checkpoints
assets		assets
data		data
data_2/csai-253-project-phase-2/best_submission		data_2/csai-253-project-phase-2/best_submission
.gitignore		.gitignore
.python-version		.python-version
EDA.ipynb		EDA.ipynb
LASSO_feature_selection.ipynb		LASSO_feature_selection.ipynb
Models.ipynb		Models.ipynb
README.md		README.md
delivery-notebook-youssef-ver.ipynb		delivery-notebook-youssef-ver.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pred-Sus-Act

Project Overview

Project Structure

1. LASSO Feature Selection (`LASSO_feature_selection.ipynb`)

2. Exploratory Data Analysis (`EDA.ipynb`)

3. Model Training (`Models.ipynb`)

Key Features

Feature Selection

Machine Learning Models

Performance Evaluation

Data Management

How to Use

Requirements

Future Improvements

About

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

DEVOLOPER-1/Pred-Sus-Act

Folders and files

Latest commit

History

Repository files navigation

Pred-Sus-Act

Project Overview

Project Structure

1. LASSO Feature Selection (LASSO_feature_selection.ipynb)

2. Exploratory Data Analysis (EDA.ipynb)

3. Model Training (Models.ipynb)

Key Features

Feature Selection

Machine Learning Models

Performance Evaluation

Data Management

How to Use

Requirements

Future Improvements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 3

Uh oh!

Languages

1. LASSO Feature Selection (`LASSO_feature_selection.ipynb`)

2. Exploratory Data Analysis (`EDA.ipynb`)

3. Model Training (`Models.ipynb`)