Skip to content

Remdox/Anomaly-Detection-Thesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Anomaly Detection Thesis

This repo stores the code implementing anomaly detection techniques on a given dataset. This has been done as part of my Bachelor's thesis in 2024. Here is a relevant paper discussing the topic: Deep Learning for Time Series Anomaly Detection: A Survey.

Table of Contents

Introduction

The thesis is focused on studying the general problem of anomaly detection, implementing anomaly detection algorithms and applying them on real-world data provided by a private company. Then the results of the various algorithms employed are compared and discussed.

What is anomaly detection

Given an industrial machine, it is possible to detect deviations in the collected data which could indicate machine degradation or imminent malfunction. The the goal of Anomaly Detection algorithms is to exactly learn when deviations happen and detect them in real-time.

These techniques can be a useful tool for optimizing the resilience of industrial processes, for example, by enabling the prediction of necessary maintenance interventions for industrial systems. These techniques not only allow for the precise identification of such deviations, but also enable dynamic responses to the emergence of previously unseen anomalies.

The field has particularly expanded its application prospects in the domain of time series, where data can follow particularly complex patterns.

Code

The time_series.ipynb file provides the code used for the implementation on the first section and the results on different types of unbalanced datasets for the next sections.

Tools used

Programming Language: Python 3.13+

Libraries used for implementing the algorithms: Scikit-learn, Tensorflow, DeepOD, kneed, lof_autotuner.

Usage

The code provided in the time_series.ipynb file can be run on Colab, Anaconda or locally as a .py file:

python time_series.py <FILEPATH> <FILENAME> <FILENAME_TRAIN> <NUM_COMPONENTS>

Where:

Parameter Description
<FILEPATH> Path to the directory containing the datasets
<FILENAME> Name of the test file (retrieved as <FILEPATH>/Noise/<FILENAME>.csv)
<FILENAME_TRAIN> Name of the training file (retrieved as <FILEPATH>/Training e Test/<FILENAME_TRAIN>.csv)
<NUM_COMPONENTS> Number of features in the dataset

Example

python time_series.py ./data machine_test machine_train 10

Dataset Structure

The datasets should be structured as follows:

Required Format:

  • First NUM_COMPONENTS - 1 columns: Feature data
  • Last column: Ground truth labels (0 = normal, 1 = anomaly)

Directory Structure:

<FILEPATH>/
├── Noise/
│   └── <FILENAME>.csv          # Test dataset
└── Training and Test/
    └── <FILENAME_TRAIN>.csv    # Training dataset

Example CSV Structure:

feature_1,feature_2,feature_3,...,feature_n,label
1.2,0.5,2.1,...,0.8,0
0.9,1.3,1.7,...,1.2,1
...

The ground truth labels in both datasets are used to evaluate the performance metrics of the anomaly detection algorithms.

Comments

Some comments are in Italian since the thesis is written for a Italian university.

Discussion

The content of the thesis will be re-adapted in a dedicated blogpost as a educational contribution.

Other useful resources

Here is a list of papers and other resources which have been studied before implementing the code. The papers span different topics such as curse of dimensionality and dimensionality reduction techniques (PCA, t-SNE, UMAP), ML and DL algorithms for anomaly detection.

Curse of Dimensionality and Dimensionality Reduction

Machine Learning Algorithms for Anomaly Detection

Deep Learning Algorithms for Anomaly Detection

Others

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published