Network Security Classification Project

📋 Project Overview

This project implements a machine learning pipeline for network security classification, focusing on detecting and classifying network security threats. The pipeline is built with reproducibility, versioning, and tracking in mind, leveraging modern MLOps tools.

🔍 Key Features

End-to-End ML Pipeline: Automated data ingestion, validation, transformation, and model training
Data Version Control: Track and version datasets using DVC
Experiment Tracking: Monitor model metrics and parameters with MLflow
Reproducibility: Ensure consistent results across different environments
CI/CD Integration: Automated testing and deployment workflows
Containerization: Docker support for consistent deployment
REST API: FastAPI-based API for real-time predictions
Text Classification: Support for text-based cyber threat intelligence data
Multiple Training Approaches: Support for both MongoDB-based and direct file-based training

🛠️ Technology Stack

Python: Core programming language
MongoDB: Database for storing network security data
Scikit-learn & XGBoost: ML algorithms for classification
DVC: Data version control
MLflow: Experiment tracking and model registry
DAGsHub: Collaborative MLOps platform
Docker: Containerization
Pytest: Testing framework
FastAPI: High-performance API framework
Uvicorn: ASGI server for FastAPI

🚀 Getting Started

Prerequisites

Python 3.8+ (Python 3.10 or 3.11 recommended for best compatibility)
Git
Docker (optional)
MongoDB connection string
DAGsHub account (for MLflow tracking)

Installation

Clone the repository:

git clone https://github.com/austinLorenzMccoy/networkSecurity_project.git
cd networkSecurity_project

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt
pip install -e .

Set up environment variables:

# Create a .env file with your MongoDB connection string and DAGsHub credentials
cp .env.template .env
# Edit the .env file with your credentials

Initialize DVC:
```
dvc init
```

Connect to DAGsHub (optional):

# Set up DAGsHub as a remote
dvc remote add origin https://dagshub.com/austinLorenzMccoy/networkSecurity_project.dvc

📊 DVC Pipeline

The project uses DVC to define and run the ML pipeline stages:

# Run the entire pipeline
dvc repro

# Run a specific stage
dvc repro -s data_ingestion
dvc repro -s data_validation
dvc repro -s data_transformation
dvc repro -s model_training

# Run the direct training pipeline (using cyber threat intelligence data)
dvc repro -s direct_training

# View pipeline visualization
dvc dag

📈 MLflow Tracking

MLflow is used to track experiments, including parameters, metrics, and artifacts:

# Start the MLflow UI locally
mlflow ui

# Or view experiments on DAGsHub
# Visit: https://dagshub.com/austinLorenzMccoy/networkSecurity_project.mlflow

DAGsHub Integration

To enable MLflow tracking with DAGsHub:

Set your DAGsHub credentials in the .env file:

MLFLOW_TRACKING_USERNAME=your_dagshub_username
MLFLOW_TRACKING_PASSWORD=your_dagshub_token

Run the training pipeline with MLflow tracking:
```
dvc repro direct_training
```
View your experiments on DAGsHub's MLflow interface

🧪 Testing

The project includes unit tests using pytest:

# Run all tests
pytest

# Run tests with coverage report
pytest --cov=networksecurity

🐳 Docker

Build and run the project using Docker:

# Build the Docker image
docker build -t network-security-project .

# Run the container
docker run -p 8000:8000 -e MONGODB_URI=your_mongodb_connection_string network-security-project

🌐 FastAPI Application

The project includes a FastAPI application for serving predictions:

# Run the FastAPI application
python app.py

# Or use the convenience script
bash run_api.sh

API Endpoints

GET /health: Check if the model is loaded and ready
GET /model-info: Get information about the trained model
POST /predict: Make predictions using feature vectors
POST /predict/text: Make predictions using raw text input

Example Usage

# Check health status
curl -X GET "http://localhost:8000/health"

# Get model information
curl -X GET "http://localhost:8000/model-info"

# Make a prediction with text
curl -X POST "http://localhost:8000/predict/text" \
  -H "Content-Type: application/json" \
  -d '{"text": "A new ransomware attack has been detected that encrypts files."}'

📁 Project Structure

.
├── .dvc/                  # DVC configuration
├── .dagshub/              # DAGsHub configuration
├── artifact/              # Generated artifacts from pipeline
│   └── direct_training/   # Artifacts from direct training approach
├── data_schema/           # Data schema definitions
├── logs/                  # Application logs
├── Network_Data/          # Raw data (tracked by DVC)
├── networksecurity/       # Main package
│   ├── components/        # Pipeline components
│   ├── constants/         # Constants and configurations
│   ├── entity/            # Data entities and models
│   ├── exception/         # Custom exceptions
│   ├── logging/           # Logging utilities
│   ├── pipeline/          # Pipeline orchestration
│   └── utils/             # Utility functions
├── notebooks/             # Jupyter notebooks for exploration
├── reports/               # Generated reports and metrics
├── tests/                 # Test cases
├── .env                   # Environment variables
├── .env.template          # Template for environment variables
├── .gitignore             # Git ignore file
├── app.py                 # FastAPI application
├── custom_model_trainer.py # Custom model trainer implementation
├── dvc.yaml               # DVC pipeline definition
├── Dockerfile             # Docker configuration
├── main.py                # Main entry point
├── pytest.ini             # Pytest configuration
├── README.md              # Project documentation
├── requirements.txt       # Python dependencies
├── run_api.sh             # Script to run the FastAPI application
├── setup.py               # Package setup file
└── train_with_components.py # Direct training script using components

🔄 CI/CD Integration

The project is set up with GitHub Actions for CI/CD:

Continuous Integration: Automated testing on pull requests
Continuous Deployment: Automatic model training and evaluation
DVC and MLflow Integration: Track experiments and data versions

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

👥 Contributors

Augustine Chibueze - GitHub

🙏 Acknowledgements

DVC for data version control
MLflow for experiment tracking
DAGsHub for MLOps collaboration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Network Security Classification Project

📋 Project Overview

🔍 Key Features

🛠️ Technology Stack

🚀 Getting Started

Prerequisites

Installation

📊 DVC Pipeline

📈 MLflow Tracking

DAGsHub Integration

🧪 Testing

🐳 Docker

🌐 FastAPI Application

API Endpoints

Example Usage

📁 Project Structure

🔄 CI/CD Integration

📝 License

👥 Contributors

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.dagshub		.dagshub
.dvc		.dvc
Network_Data		Network_Data
Network_Security.egg-info		Network_Security.egg-info
backend		backend
custom_jenkins		custom_jenkins
data_schema		data_schema
notebooks		notebooks
reports		reports
tests		tests
.DS_Store		.DS_Store
.env.template		.env.template
.gitignore		.gitignore
=4.0		=4.0
Dockerfile		Dockerfile
Jenkinsfile		Jenkinsfile
README.md		README.md
app.py		app.py
custom_model_trainer.py		custom_model_trainer.py
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
feature_importance.csv		feature_importance.csv
main.py		main.py
push_data.py		push_data.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
run_api.sh		run_api.sh
setup.py		setup.py
setup_dvc.sh		setup_dvc.sh
test_mongodb.py		test_mongodb.py
train_model.py		train_model.py
train_with_components.py		train_with_components.py

austinLorenzMccoy/networkSecurity_project

Folders and files

Latest commit

History

Repository files navigation

Network Security Classification Project

📋 Project Overview

🔍 Key Features

🛠️ Technology Stack

🚀 Getting Started

Prerequisites

Installation

📊 DVC Pipeline

📈 MLflow Tracking

DAGsHub Integration

🧪 Testing

🐳 Docker

🌐 FastAPI Application

API Endpoints

Example Usage

📁 Project Structure

🔄 CI/CD Integration

📝 License

👥 Contributors

🙏 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages