MATLAB implementation of the cSAX time-series symbolic representation. Tested with MATLAB versions R2018b and R2019a.
The cSAX (Clustering SAX) [1] method is an extension of the well-known SAX [2] (Symbolic Aggregate Approximation) for time-series dimensionality reduction. The main contribution of the method is a SAX-based representation oriented for high-level mining tasks that employs the Mean-Shift clustering method for the descritization and is shown to enhance anomaly detection in the lower-dimensional space. The accuracy has been measured for anomaly detection and discord discovery.
This project consists of the following components:
- cSAX, cSAX_overlap: Main functions, use whichever suits your application. The cSAX.m transforms the dataset with non-overlapping windows, whereas cSAX_overlap.m transforms every possible subsequence (even overlapping) separately.
- tsPAA: (c) 2003, E. Keogh, J. Lin, S. Lonardi, P. Patel, L. Wei. Time-series to PAA approximation. Original file with minor modifications.
- timeseries2symbol: (c) 2003, E. Keogh, J. Lin, S. Lonardi, P. Patel, L. Wei. Computes SAX representation of the data. Original file with minor modifications.
- HGMeanShiftCluster: The Mean-Shift clustering algorithm. This is a modified version of:
Mean Shift Clustering (https://www.mathworks.com/matlabcentral/fileexchange/10161-mean-shift-clustering), (c) 2006 Bart Finkston,
meanshift_matlab (https://github.com/hangong/meanshift_matlab), (c) 2015 Han Gong, University of East Anglia.
A large collection of labeled datasets for anomaly detection is available at the NUMENTA library https://github.com/numenta/NAB and at the Hexagon ML/UCR Anomaly Detection archive https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ (refer to https://forums.hexagon-ml.com/t/multi-dataset-time-series-anomaly-detection/591/136 to find the labels).
- Download the project's source files.
- Export as they are to a single folder.
- Call either cSAX.m or cSAX_overlap.m with the appropriate inputs.
[1] K. Bountrogiannis, G. Tzagkarakis and P. Tsakalides, "Distribution Agnostic Symbolic Representations for Time Series Dimensionality Reduction and Online Anomaly Detection," in IEEE Transactions on Knowledge and Data Engineering, doi: 10.1109/TKDE.2022.3174630.
[2] J. Lin et al., “Experiencing SAX: A novel symbolic representation of time series”, Data Min. Knowl. Disc., vol. 15, no. 2, pp. 107–144, 2007, doi: 10.1007/s10618-007-0064-z
This code is released under GPL v.3.0. If you use this code for academic works, please cite the following publication:
K. Bountrogiannis, G. Tzagkarakis and P. Tsakalides, "Distribution Agnostic Symbolic Representations for Time Series Dimensionality Reduction and Online Anomaly Detection," in IEEE Transactions on Knowledge and Data Engineering, doi: 10.1109/TKDE.2022.3174630.