Server Log Analysis Project

Overview

This project is a comprehensive server log analysis system that leverages Apache Spark, Kafka, and geolocation data to process, clean, and analyze server logs in real-time.

Features

Real-time log ingestion from Kafka
Log data cleaning and validation
Geolocation enrichment
processing with Apache Spark
Log Parsing and Cleaning
Server Log Analysis Dashboard
Web Log Anomaly Detection
- Detects suspicious status codes
- Identifies unusual HTTP methods
- Tracks high-traffic IP addresses
- Monitors suspicious user agents
Project Structure
scripts/: Core processing scripts
- main.py: Main entry point
- kafka_producer.py: Kafka log producer
- processing/: Data cleaning modules
- analysis/: Log analytics
- helper/: Utility functions
web/: Web interface components
data/: Processed log data storage
GeoLite_data/: Geolocation database

Key Technologies

Apache Spark
Apache Kafka
Python
Geolocation Analysis
Streaming Data Processing

Prerequisites

Python 3.8+
Java 8 or higher
Apache Kafka
Apache Spark
Apache Hadoop
GeoLite2 City database

Installation

1. Install Java

2. Install Apache Kafka

3. Install Apache Spark

4. Install Apache Hadoop

Clone the Repository

git clone https://github.com/yassirsalmi/server-log-analysis.git
cd server-log-analysis

1. Create Virtual Environment

python3 -m venv server-analysis-env
source server-analysis-env/bin/activate

2. Install Python Dependencies

pip install -r requirements.txt

3. Configure Geolocation Database

Download GeoLite2 City database from MaxMind
Place the database in GeoLite_data/GeoLite2-City.mmdb

Configuration

Modify the following parameters in scripts/main.py:

kafka_bootstrap_servers: Kafka broker address
kafka_topic: Kafka topic for log ingestion
hdfs_output_path: Output path for cleaned logs
geoip_db_path: Path to GeoLite2 City database

Running the Project

./startup.sh

This script will:

Start Kafka and Zookeeper services
Start Hadoop services
Launch Kafka producer
Start main log processing
Initialize web application

Anomaly Detection API

The /analysis/anomalies endpoint provides a comprehensive log anomaly detection service:

Endpoint: `/analysis/anomalies`

Query Parameters:

log_path (optional): Path to the log file. Defaults to project's default log file.

Response Example:

{
  "total_anomalies": 5,
  "anomalies": [
    {
      "type": "Suspicious Status Code",
      "ip": "192.168.1.100",
      "status_code": 403,
      "endpoint": "/admin",
      "timestamp": "2024-01-15T10:30:45+00:00"
    },
    ...
  ]
}

Anomaly Types

Suspicious Status Codes (401, 403, 500, etc.)
Unusual HTTP Methods (DELETE, PUT)
High Request Rate per IP
Suspicious User Agents

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data/server_logs		data/server_logs
scripts		scripts
web		web
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
startup.sh		startup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Server Log Analysis Project

Overview

Features

Project Structure

Key Technologies

Prerequisites

Installation

1. Install Java

2. Install Apache Kafka

3. Install Apache Spark

4. Install Apache Hadoop

Clone the Repository

1. Create Virtual Environment

2. Install Python Dependencies

3. Configure Geolocation Database

Configuration

Running the Project

Anomaly Detection API

Endpoint: `/analysis/anomalies`

Anomaly Types

About

Uh oh!

Releases

Packages

Uh oh!

Languages

yassirsalmi/server-log-analysis

Folders and files

Latest commit

History

Repository files navigation

Server Log Analysis Project

Overview

Features

Project Structure

Key Technologies

Prerequisites

Installation

1. Install Java

2. Install Apache Kafka

3. Install Apache Spark

4. Install Apache Hadoop

Clone the Repository

1. Create Virtual Environment

2. Install Python Dependencies

3. Configure Geolocation Database

Configuration

Running the Project

Anomaly Detection API

Endpoint: /analysis/anomalies

Anomaly Types

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Endpoint: `/analysis/anomalies`

Packages