A comprehensive Security Information and Event Management (SIEM) solution enhanced with AI/ML-powered log analysis, offering advanced anomaly detection and natural language processing capabilities.
This project combines the powerful Elastic Stack (Elasticsearch, Kibana, Filebeat) with a custom Python-based Security Log Analyzer to create a complete security monitoring solution. The system collects, stores, and visualizes security logs while leveraging machine learning and NLP techniques to uncover hidden patterns, detect anomalies, and extract actionable insights.
- Log Collection: Capture logs from multiple sources with Filebeat
- Centralized Storage: Index and store security data in Elasticsearch
- Real-time Visualization: Monitor security events through Kibana dashboards
- Time-series Analysis: Track security patterns over extended periods
- Docker-based Deployment: Easy setup with containerized components
- ML-based Anomaly Detection: Identify suspicious patterns and outliers
- NLP Analysis: Extract insights from log message content
- Automated Security Reporting: Generate comprehensive security reports
- Bi-directional Integration: Read from and write enriched data back to Elasticsearch
- Custom Dashboards: Visualize ML/NLP findings in Kibana
log-analyzer/
├── config/ # Configuration files
│ ├── filebeat/ # Filebeat configuration
│ │ └── filebeat.yml # Filebeat settings
│ └── logstash/ # Logstash configuration (optional)
│ └── pipeline/ # Logstash processing pipelines
│ └── main.conf # Main pipeline configuration
├── data/ # Log data directory
│ └── sample.log # Sample security logs for testing
├── scripts/ # Deployment and setup scripts
│ ├── setup/ # Setup scripts
│ │ ├── init.sh # Initialization script
│ │ ├── install_linux.sh # Linux installation script
│ │ ├── install_macos.sh # MacOS installation script
│ │ └── install_windows.bat # Windows installation script
├── src/ # Python source code
│ ├── anomaly_detector.py # ML-based anomaly detection
│ ├── elasticsearch_connector.py # Elasticsearch integration
│ ├── log_parser.py # Log parsing utilities
│ ├── main.py # Main Python application
│ ├── nlp_analyzer.py # Natural language processing
│ ├── platform_utils.py # Platform-specific utilities
│ └── visualizer.py # Visualization and reporting
├── .env # Environment variables (not committed)
├── docker-compose.core.yml # Docker Compose configuration
├── README.md # Project documentation
├── requirements.txt # Python dependencies
└── shell.nix # Nix environment configuration
- Docker and Docker Compose
- Python 3.8 or higher
- 4GB RAM minimum (8GB recommended)
- 20GB free disk space
-
Clone the repository:
git clone https://github.com/yourusername/log-analyzer.git cd log-analyzer
-
Create a
.env
file (optional):cp .env.example .env # Edit .env with your settings
-
Start the core SIEM stack:
docker-compose -f docker-compose.core.yml up -d
-
Access Kibana at http://localhost:5601
- Default credentials: elastic / changeme
-
Create a Python virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up Elasticsearch credentials: Create or edit the
.env
file:ES_USERNAME=elastic ES_PASSWORD=changeme ES_HOSTS=http://localhost:9200
- Ingest Logs: Place log files in the
data/
directory - Create a Data View in Kibana:
- Go to Stack Management > Data Views
- Create a view with pattern
filebeat-*
- Set the timestamp field to
@timestamp
- View Logs in Discover:
- Navigate to Discover
- Select your data view
- Adjust the time range to see your data
- Create Dashboards:
- Go to Dashboard > Create dashboard
- Add visualizations for security events
python src/main.py --log-file data/sample.log --output ./reports --open
# Analyze last 7 days of logs
python src/main.py --elasticsearch --es-index "filebeat-*" --time-from "now-7d"
# Analyze a specific date range
python src/main.py --elasticsearch --es-index "filebeat-*" --time-from "2023-01-01" --time-to "2025-03-31"
After running the analyzer with --elasticsearch
flag:
- In Kibana, create a new data view with pattern
security-analysis*
- Set the timestamp field to
analysis_timestamp
- Create visualizations for:
- Anomaly scores over time
- Security event distribution
- NLP sentiment analysis
- High-risk events table
Filebeat is configured to collect logs from the data/
directory and send them to Elasticsearch. The configuration is in config/filebeat/filebeat.yml
.
Elasticsearch stores and indexes all security events and analysis results. It runs with security features disabled in development mode.
Kibana provides the visualization interface for both raw security data and enriched analysis results.
The Python analyzer enhances the SIEM with:
- Anomaly Detection: Identifies unusual patterns in log data
- NLP Analysis: Extracts insights from log messages
- Elasticsearch Integration: Reads from and writes to your SIEM
- Reporting: Generates HTML reports with interactive visualizations
- Modify
config/filebeat/filebeat.yml
to include additional log sources - Update the Python log parser in
src/log_parser.py
to handle new log formats
- Modify
src/anomaly_detector.py
to implement custom detection algorithms - Update
src/nlp_analyzer.py
to extract additional security insights
- Add additional Elastic Stack components like Logstash for complex data transformation
- Integrate with additional data sources using Beats family collectors
This project is configured for development and testing. For production deployment:
- Enable Elasticsearch security features
- Configure proper authentication and TLS encryption
- Implement proper data retention policies
- Consider network segmentation for the SIEM components
- The Elastic Stack team for Elasticsearch, Kibana, and Filebeat
- The open-source security community for tools and techniques
- Contributors to the machine learning and NLP libraries used in this project
Latteflo With assistance from Claude AI (Anthropic)