MATLAB implementation of the pSAX time-series symbolic representation. Tested with MATLAB versions R2018b and R2019a.
The pSAX (Kernel-based Probabilistic SAX) [1], [2] method is an extension of the well-known SAX [3] (Symbolic Aggregate Approximation) for time-series dimensionality reduction. The main contribution of the method is a SAX-based representation that adapts directly to the underlying probability distribution of the time-series data, thus providing a more accurate symbolic approximation. The accuracy has been measured and compared to the conventional SAX with the (significant for databases performance) Tightness of Lower Bound metric, and also with the Mean Squared Error.
This project consists of the following components:
- pSAX, pSAX_overlap: Main functions, use whichever suits your application. The pSAX.m transforms the dataset with non-overlapping windows, whereas pSAX_overlap.m transforms every possible subsequence (even overlapping) separately.
- tsPAA: (c) 2003, E. Keogh, J. Lin, S. Lonardi, P. Patel, L. Wei. Time-series to PAA approximation. Original file with minor modifications.
- timeseries2symbol: (c) 2003, E. Keogh, J. Lin, S. Lonardi, P. Patel, L. Wei. Computes SAX representation of the data. Original file with minor modifications.
- mvksdensity, statskcompute, statskernelinfo: (c) 2015-2016 The MathWorks, Inc. These are MATLAB's source files. They are called from the built-in function 'ksdensity'. We tweaked them to i) allow to estimate arbitrarily large number of density points (it was limited to 100 before) and ii) to fix the optimal smoothness parameter estimation for the Epanechnikov kernel, as it was set for the Gaussian kernel only. See https://www.mathworks.com/help/stats/ksdensity.html for more info.
- lloydmax: Lloyd-Max quantizer. Quantize according to a probability density function.
- k-means++: The k-means++ algorithm for initialization of k-means. Taken from the k-means file of Laurent S.: (https://www.mathworks.com/matlabcentral/fileexchange/28804-k-means), version 1.7.0.0
A large collection of datasets is available at https://www.cs.ucr.edu/~eamonn/iSAX/iSAX.html
- Download the project's source files.
- Export as they are to a single folder.
- Call either pSAX.m or pSAX_overlap.m with the appropriate inputs.
[1] K. Bountrogiannis, G. Tzagkarakis and P. Tsakalides, "Data-driven Kernel-based Probabilistic SAX for Time Series Dimensionality Reduction," 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, pp. 2343-2347, 2021, doi: 10.23919/Eusipco47968.2020.9287311.
[2] K. Bountrogiannis, G. Tzagkarakis and P. Tsakalides, "Distribution Agnostic Symbolic Representations for Time Series Dimensionality Reduction and Online Anomaly Detection," in IEEE Transactions on Knowledge and Data Engineering, doi: 10.1109/TKDE.2022.3174630.
[3] J. Lin et al., “Experiencing SAX: A novel symbolic representation of time series”, Data Min. Knowl. Disc., vol. 15, no. 2, pp. 107–144, 2007, doi: 10.1007/s10618-007-0064-z
This code is released under GPL v.3.0. If you use this code for academic works, please cite at least one of the following publications:
K. Bountrogiannis, G. Tzagkarakis and P. Tsakalides, "Data-driven Kernel-based Probabilistic SAX for Time Series Dimensionality Reduction," 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, 2021, pp. 2343-2347, doi: 10.23919/Eusipco47968.2020.9287311.
K. Bountrogiannis, G. Tzagkarakis and P. Tsakalides, "Distribution Agnostic Symbolic Representations for Time Series Dimensionality Reduction and Online Anomaly Detection," in IEEE Transactions on Knowledge and Data Engineering, doi: 10.1109/TKDE.2022.3174630.