A comprehensive Python library for anomaly detection including classical methods, deep learning approaches, time series methods, and application-specific algorithms. This library provides a unified interface for various anomaly detection algorithms, making it easy to compare and use different approaches for your specific use case.
- Unified API: Consistent interface across all algorithms
- Comprehensive Coverage: From classical to state-of-the-art methods
- Easy Comparison: Built-in benchmarking and comparison tools
- Extensible: Easy to add new algorithms
- Well Documented: Examples and tutorials for each method
- Production Ready: Optimized implementations with proper error handling
- Active Development: Regular updates and new algorithm implementations
# Install the core package
pip install awesome-anomaly-detection
# Install with deep learning support
pip install awesome-anomaly-detection[deep]
# Install with full dependencies
pip install awesome-anomaly-detection[full]
# Install from source (recommended for development)
git clone https://github.com/chenxingqiang/awesome-anomaly-detection.git
cd awesome-anomaly-detection
pip install -e .
from anomaly_detection.time_series import RULSIFDetector
import numpy as np
# Generate sample data
np.random.seed(42)
data = np.random.randn(1000, 3)
# Use RULSIF for time series change detection
rulsif = RULSIFDetector(alpha=0.5, n_kernels=50)
rulsif.fit(data)
scores = rulsif.score_samples(data)
change_points = rulsif.detect_changes(data)
print(f"Detected {np.sum(change_points)} change points")
- RULSIF Detector: Fully functional time series change point detection
- Core Infrastructure: Base classes, metrics, visualization tools
- Kernel System: Gaussian, Linear, Polynomial, Laplacian, Sigmoid kernels
- Evaluation Framework: Comprehensive metrics and comparison tools
- Classical Methods: Isolation Forest, LOF, One-Class SVM
- Deep Learning Methods: Autoencoders, GANs, LSTM-based methods
- Application Methods: KPI monitoring, log analysis, driving data
- Algorithm Comparison: Built-in benchmarking tools
- Hyperparameter Optimization: Automated parameter tuning
- Real-time Detection: Streaming anomaly detection
- Interpretability Tools: Explainable anomaly detection
- Paper: Change-Point Detection in Time-Series Data by Relative Density-Ratio Estimation
- Description: RULSIF algorithm for change-point detection in time series
- Use Case: Change point detection, distribution shifts, time series analysis
- Implementation:
anomaly_detection.time_series.RULSIFDetector
- Status: β Production Ready
Example Usage:
from anomaly_detection.time_series import RULSIFDetector
# Initialize detector
detector = RULSIFDetector(
alpha=0.5, # Relative density ratio parameter
n_kernels=50, # Number of kernel basis functions
n_folds=5, # Cross-validation folds
random_state=42
)
# Fit with reference and test periods
detector.fit(data, reference_data=ref_data, test_data=test_data)
# Detect changes
changes = detector.detect_changes(new_data)
scores = detector.score_samples(new_data)
- Paper: Isolation Forest
- Description: An efficient anomaly detection algorithm based on the principle that anomalies are easier to isolate than normal points
- Use Case: High-dimensional data, fast detection
- Implementation:
anomaly_detection.classical.IsolationForest
- Status: π§ In Development
- Paper: LOF: Identifying Density-Based Local Outliers
- Description: Density-based local outlier detection using k-nearest neighbors
- Use Case: Local density variations, clustering-based detection
- Implementation:
anomaly_detection.classical.LOF
- Status: π§ In Development
- Paper: Support Vector Method for Novelty Detection
- Description: One-class SVM for novelty detection
- Use Case: Novelty detection, high-dimensional spaces
- Implementation:
anomaly_detection.classical.OneClassSVM
- Status: π§ In Development
- Variational Autoencoder: VAE-based anomaly detection using reconstruction probability
- Robust Autoencoder: Robust deep autoencoders for anomaly detection
- Deep Autoencoding Gaussian Mixture Model: DAGMM for unsupervised anomaly detection
- Unsupervised Anomaly Detection with GANs: GAN-based anomaly detection for marker discovery
- Efficient-GAN-Based Anomaly Detection: Efficient GAN implementations
- LSTM for Anomaly Detection: Long short-term memory networks for sequential data
- LSTM Encoder-Decoder: Multi-sensor anomaly detection
- Stochastic Online Anomaly Analysis: Online anomaly analysis for streaming time series
- Real-time Change Detection: Continuous monitoring and detection
- Generalized Student-t: Mixed-type anomaly detection
- Periodic Pattern Detection: Multiple periods and periodic patterns
- Unsupervised KPI Anomaly Detection: VAE-based anomaly detection for seasonal KPIs
- Web Application Monitoring: Real-time KPI monitoring
- DeepLog: Deep learning for system log anomaly detection
- Invariant Mining: Mining invariants from logs for system problem detection
- Aggressive Driving Detection: Anomaly detection for driving behavior analysis
- Vehicle Telemetry: Real-time vehicle data monitoring
- Python 3.7 or higher
- pip package manager
- Git (for source installation)
pip install awesome-anomaly-detection
# Core package
pip install awesome-anomaly-detection
# With deep learning support
pip install awesome-anomaly-detection[deep]
# With full dependencies
pip install awesome-anomaly-detection[full]
git clone https://github.com/chenxingqiang/awesome-anomaly-detection.git
cd awesome-anomaly-detection
pip install -e .
git clone https://github.com/chenxingqiang/awesome-anomaly-detection.git
cd awesome-anomaly-detection
pip install -e .[dev]
numpy>=1.19.0
: Numerical computingscipy>=1.5.0
: Scientific computingscikit-learn>=0.24.0
: Machine learning utilitiespandas>=1.1.0
: Data manipulationmatplotlib>=3.3.0
: Plottingseaborn>=0.11.0
: Statistical visualization
torch>=1.8.0
: PyTorch for deep learningtensorflow>=2.4.0
: TensorFlow for deep learning
pytest>=6.0
: Testing frameworkblack>=21.0
: Code formattingflake8>=3.8
: Lintingmypy>=0.800
: Type checking
import numpy as np
from anomaly_detection.time_series import RULSIFDetector
# Generate synthetic time series with known anomalies
np.random.seed(42)
n_samples = 1000
n_features = 3
# Create normal data with trend and seasonality
t = np.linspace(0, 10, n_samples)
base_signal = 2 * np.sin(2 * np.pi * t) + 0.5 * t
noise = np.random.normal(0, 0.3, (n_samples, n_features))
time_series = np.zeros((n_samples, n_features))
for i in range(n_features):
time_series[:, i] = base_signal + noise[:, i]
# Introduce anomalies
anomaly_start, anomaly_end = 400, 450
time_series[anomaly_start:anomaly_end, 0] += 3.0 # Large spike
# Split into reference and test periods
split_point = n_samples // 2
reference_data = time_series[:split_point]
test_data = time_series[split_point:]
# Initialize and fit RULSIF detector
detector = RULSIFDetector(
alpha=0.5,
n_kernels=50,
n_folds=5,
random_state=42
)
detector.fit(time_series, reference_data=reference_data, test_data=test_data)
# Detect anomalies
anomaly_scores = detector.score_samples(time_series)
detected_changes = detector.detect_changes(time_series)
print(f"Optimal sigma: {detector.sigma_:.4f}")
print(f"Optimal lambda: {detector.lambda_:.4f}")
print(f"Detected {np.sum(detected_changes)} change points")
from anomaly_detection.core.visualization import AnomalyVisualizer
from anomaly_detection.core.metrics import AnomalyMetrics
# Create visualizer
visualizer = AnomalyVisualizer()
# Plot time series with detected anomalies
fig = visualizer.plot_time_series_with_anomalies(
time_series=time_series,
anomaly_labels=detected_changes,
anomaly_scores=anomaly_scores,
title="Time Series with RULSIF Detected Anomalies"
)
# Plot anomaly score distribution
fig2 = visualizer.plot_anomaly_scores_distribution(
scores=anomaly_scores,
title="RULSIF Anomaly Scores Distribution"
)
# Compute and display metrics
metrics = AnomalyMetrics.compute_basic_metrics(
y_true=np.zeros(len(time_series)), # Assuming no ground truth
y_pred=detected_changes,
y_scores=anomaly_scores
)
print("Detection Statistics:")
print(f" Total samples: {len(time_series)}")
print(f" Detected anomalies: {np.sum(detected_changes)}")
print(f" Detection rate: {np.sum(detected_changes) / len(time_series):.2%}")
# Test different alpha values
alpha_values = [0.1, 0.3, 0.5, 0.7, 0.9]
results = {}
for alpha in alpha_values:
detector = RULSIFDetector(
alpha=alpha,
n_kernels=30,
n_folds=3,
random_state=42
)
detector.fit(time_series, reference_data=reference_data, test_data=test_data)
scores = detector.score_samples(time_series)
# Store results for comparison
results[f"alpha_{alpha}"] = {
'sigma': detector.sigma_,
'lambda': detector.lambda_,
'mean_score': np.mean(scores),
'std_score': np.std(scores)
}
# Compare results
for alpha, result in results.items():
print(f"{alpha}: Ο={result['sigma']:.4f}, Ξ»={result['lambda']:.4f}")
# Run all tests
pytest
# Run specific test file
pytest tests/test_rulsif.py
# Run with coverage
pytest --cov=anomaly_detection
# Run with verbose output
pytest -v
tests/
βββ __init__.py
βββ test_rulsif.py # RULSIF detector tests
βββ test_classical/ # Classical methods tests
βββ test_deep_learning/ # Deep learning tests
βββ test_time_series/ # Time series tests
βββ test_applications/ # Application tests
# Clone repository
git clone https://github.com/chenxingqiang/awesome-anomaly-detection.git
cd awesome-anomaly-detection
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e .[dev]
# Install pre-commit hooks
pre-commit install
# Format code
black anomaly_detection/ tests/ examples/
# Lint code
flake8 anomaly_detection/ tests/ examples/
# Type checking
mypy anomaly_detection/
# Run all quality checks
pre-commit run --all-files
To add a new anomaly detection algorithm:
- Create the algorithm class in the appropriate module
- Inherit from the correct base class:
UnsupervisedAnomalyDetector
for unsupervised methodsSupervisedAnomalyDetector
for supervised methods
- Implement required methods:
_fit()
: Internal fitting logicpredict()
: Binary predictions (0/1)score_samples()
: Anomaly scores
- Add tests in the corresponding test directory
- Update imports in the main
__init__.py
file - Add documentation and examples
Example Structure:
from anomaly_detection.core.base import UnsupervisedAnomalyDetector
class MyAlgorithm(UnsupervisedAnomalyDetector):
def __init__(self, param1=1.0, **kwargs):
super().__init__(**kwargs)
self.param1 = param1
def _fit(self, X, **kwargs):
# Implementation here
pass
def predict(self, X):
# Return binary predictions
pass
def score_samples(self, X):
# Return anomaly scores
pass
The RULSIF detector has been tested on various synthetic datasets:
- Small datasets (< 1K samples): ~0.1-1 second
- Medium datasets (1K-10K samples): ~1-10 seconds
- Large datasets (> 10K samples): ~10-100 seconds
Performance depends on:
- Number of kernel basis functions (
n_kernels
) - Cross-validation folds (
n_folds
) - Data dimensionality
- Hardware specifications
- Peak memory: ~2-5x the input data size
- Kernel matrices: O(n_kernels Γ n_samples) storage
- Optimization: Efficient memory management for large datasets
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Make your changes following the coding standards
- Add tests for new functionality
- Update documentation as needed
- Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
- Algorithm Implementations: Add new anomaly detection methods
- Performance Optimization: Improve existing algorithms
- Documentation: Enhance examples and tutorials
- Testing: Expand test coverage
- Bug Fixes: Report and fix issues
- Python 3.7+ compatibility
- Type hints for all public methods
- Docstrings following Google style
- Black code formatting
- Flake8 linting compliance
- Test coverage > 80%
This project is licensed under the MIT License - see the LICENSE file for details.
- Original RULSIF Implementation: Based on the work by Liu et al. (2012)
- Research Community: All the researchers whose papers are referenced
- Open Source Contributors: Community members who contribute to the project
- Scientific Python Ecosystem: NumPy, SciPy, scikit-learn, and other libraries
- Author: Xingqiang Chen
- Email: joy6677@qq.com
- GitHub: @chenxingqiang
- Project: https://github.com/chenxingqiang/awesome-anomaly-detection
- Issues: GitHub Issues
- Discussions: GitHub Discussions
For academic use, please cite the relevant papers for the algorithms you use. Each algorithm implementation includes references to the original papers.
- RULSIF: Liu S, Yamada M, Collier N, et al. Change-Point Detection in Time-Series Data by Relative Density-Ratio Estimation. 2012.
- Isolation Forest: Liu FT, Ting KM, Zhou ZH. Isolation Forest. ICDM 2008.
- LOF: Breunig MM, Kriegel HP, Ng RT, Sander J. LOF: Identifying Density-Based Local Outliers. SIGMOD 2000.
- One-Class SVM: SchΓΆlkopf B, Williamson RC, Smola AJ, et al. Support Vector Method for Novelty Detection. NIPS 2000.
- Classical anomaly detection methods
- Basic deep learning implementations
- Enhanced documentation and tutorials
- Performance benchmarks
- Advanced deep learning methods
- Real-time streaming detection
- Algorithm comparison tools
- Web dashboard
- Production-ready implementations
- Comprehensive test coverage
- Performance optimization
- Enterprise features
Star this repository if you find it useful! β
Join our community and help make anomaly detection accessible to everyone! π