Skip to content

02 MSSA

Konstantin Ibadullaev edited this page Dec 11, 2024 · 12 revisions

MSSA. Background.

Construction of Trajectory Matrix

The first step in the MSSA is the same as for the SSA. One starts with the construction of the trajectory matrix. Although the construction procedure is quite similar, one should be aware of the size of the trajectory matrix. Let $s$ denote the number of the given time series, $N$ the length of each of $s$ time series, $L$ - window size and $2 \leq L \leq N/2$ and $L \in N$, $K=N-L+1$ The trajectory matrix for the MSSA results in a $L \times Ks$ matrix $X$. We simply stack the trajectory matrices $X^i$ for each of $s$ time series. The columns of the trajectory matrix $X^i$ for $i$-th time series comprise lagged versions of $i$ time series just like in SSA.

The $X^i$ matrices can stacked in a vertical or horizontal direction, however due to performance reasons for SVD, horizontal direction is often preferred.

Thus, each matrix $X^i$ for $i=0,1,2,...,s$ can be represented as follows:

$$X^i = \begin{bmatrix} f^i_0 & f^i_1 &f^i_2 & f^i_3 &\dots & f^i_{N-L}\\\\ f^i_1 & f^i_2 &f^i_3 & f^i_4 &\dots & f^i_{N-L+1}\\\\ f^i_2 & f^i_3 &f^i_4 & f^i_5 &\dots & f^i_{N-L+2}\\\\ \vdots & \vdots &\vdots & \vdots &\vdots & \vdots\\\\ f^i_{L-1}& f_{L} &f^i_{L+1} & f^i_{L+2} &\dots & f^i_{N-1} \end{bmatrix}$$

It is clear that the elements of the anti-diagonals are equal. This a Hankel matrix.

Finally, we obtain the resulting trajectory matrix $X$ as

$$X = \begin{bmatrix} X^0 & X^1 & X^2 & ... & X^s \end{bmatrix}$$

Decomposition of the Trajectory Matrix X

The second step is to decompose $X$. There exist several options, we focus on the SVD decomposition and the Eigendecomposition.

  1. SVD Decomposition

This option is used primarly when the time series is not too long, because SVD is computationally intensive algorithm. It becomes critical especially for the MSSA, where the trajectory matrix becomes extremly high-dimensional, because of stacking of trajectory matrices for each time series. The formula for SVD is defined as follows:

$$X=U \Sigma V^T$$

where:

  • $U$ is an $L \times L$ unitary matrix containing the orthonormal set of left singular vectors of $X$ as columns
  • $\Sigma$ is an $L \times K$ rectangular diagonal matrix containing singular values of $X$ in the descending order
  • $V^T$ is a $Ks \times Ks$ unitary matrix containing the orthonormal set of right singular vectors of $X$ as columns.

The SVD of the trajectory matrix can be also formulated as follows:

$$X = \sum^{d-1}_{i=0}\sigma_iU_iV_i^T = \sum^{d-1}_{i=0}{X_i}$$

where:

  • ${\sigma_i,U_i,V_i}$ it the $i^{th}$ eigentriple of the SVD

  • $\sigma_i$ is the $i^{th}$ singular value, is a scaling factor that determines the relative importance of the eigentriple

  • $U_i$ is a vector representing the $i^{th}$ column of $U$, which spans the column space of $X$

  • $V_i$ is a vector representing the $i^{th}$ column of $V$, which spans the row space of $X$. There are $Ks$ such vectors for the MSSA.

  • $d$, such that $d \leq L$, is a rank of the trajectory matrix $X$. It can be regarded as the instrinsic dimensionality of the time series' trajectory space

  • $X_i=\sigma_iU_iV_i^T$ is the $i^{th}$ elementary matrix

Note, that the difference arises in the dimensionality of the elementary matrices. While for the SSA we obtain an elementary matrix $X_i$ of the size $L \times K$, for the MSSA we get an elementary matrix $X_i$ of the size $L \times Ks $!

  1. Randomized SVD

    This techinque is used for the approximation of the classic SVD and primarly used for SSA with large $N$ and in the case of MSSA where $X$ is high-dimensional because of stacking of trajectory matrices of each considered time series. The main purpose of this technique is to reduce the dimensionality of $U$ and $V$.

Eigentriple Grouping and Elementary Matrix Separation

On this step we perform an eigentriple grouping to decompose time series into several components. We make use of one or several techniques listed below:

  1. Visual Inspection of Elementary Matrices for Patterns

    Matrices without patterns are referred as the trend component, while seasonal and noise components are associated with repeating patterns. The most frequent alterations of the pattern are indicators of the noise component.Note, that in the MSSA case it might be quite inconvenient due to the dimensionality of elementary matrices.

  2. Eigenvalue Contribution and Shape of Eigenvectors $U$

    This combines considering the shape of eigenvectors $U$ and relative contribution of eigenvalues. The most important $\sigma_i$-s are associated with the trend component. The related eignevectors demonstrate absence of periodicity and clear trend, while seasonal and noice components have a great deal lesser contribution and periodicity.

  3. Weighted Correlation between Components

    This approach is frequently used for the automatic eigentriple grouping. It is obviuous, that the most correlated components $X_i$ have more in common than decorrelated or negatively correlated ones.

Time Series Reconstruction

The time series reconstruction is very similar to one for the SSA. However, one should be careful with the dimensionality of the elemenatary matrices $X_i$ as their second dimension now $Ks$, i.e a subset formed by $K$ consequtive columns in such a matrix corresponds to an elementary sub-matrix $X^j_i$ for the time series $j$, with $j=0,...,s$. We perform then diagonal averaging separately on each of such a sub-matrix for every elementary matrix $X_i$ like we do in SSA. Every entry of sub-matrix $X^j_i$ is calculated according to the following formula:

$$\widetilde{x}^j_{m, n}= \begin{cases} \frac{1}{g+1}\sum^{g}_{l=0}x_{l,g-l}, & 0\leq g\leq \text{L}- 1\\\ \frac{1}{L-1}\sum^{L-1}_{l=0}x_{l,g-l}, & \text{L} \leq g\leq \text{K} - 1\\\ \frac{1}{K+L-g-1}\sum^{L}_{l=g-K+1}x_{l,g-l}, & \text{K}\leq g\leq \text{K+L}- 2\\\ \end{cases}$$

For the sake of the readability, we denote $g = m+n$:

The obtained result is a matrix $s \times N$ containing values for the component $i$ for all $s$ time series.

Forecasting

There several options for the forecasting future values in SSA. In py-ssa-lib the K-Forecasting and L-Forecasting are implemented

L-Forecasting

This forecasting method relies on the signal subspace formed by ${U}^r_{j=0}$ vectors obtained via SVD for a given time series, where $r\leq d-1$.

To begin with, we denote by $Y$ the vector consisting of the last $L-1$ values of the reconstructed time series:

$$Y = \begin{bmatrix} f^0_{N-L+1} & \dots & f^0_{N-1} \\\\ f^1_{N-L+1} & \dots & f^1_{N-1} \\\\ f^2_{N-L+1} & \dots & f^2_{N-1} \\\\ \vdots & \vdots & \vdots \\\\ f^s_{N-L+1} & \dots & f^s_{N-1} \\\\ \end{bmatrix}$$

Then we need to subset a signal space composed from the $\underline{U_j}$. Each eigenvector $\underline{U_j}$ contains the first $L-1$ coordinates, while the coordinates $L$ form another vector $\vec{\pi}$ and are denoted as $\pi_j$. It is important that the sum of squares of the last coordinates( denoted by $\nu$) is strictly less than one. That is,

$$\nu=\sum^r_j\pi_j^2<1$$

Subsequently, the definitions above lead us to the cornerstone of the L-Forecasting, Linear Recurrence Relations coefficients, denoted by $R_L \in \mathbb{R}^{L-1}$.

$$R_L=\frac{1}{1 -\nu^2 }\sum^r_{j=0}\pi_j\underline{U_j}$$

Finally, one can make a one-step forecast using the formula below:

$$\vec{Y}_{N+1} = Y R_L$$

On top of that, one can also predict iteratively $M$ future values. In order to forecast values for $M$ steps ahead, one should slide a window along $\vec{y}$, including recently predicted values $N+1...M$ into the $Y$ and leaving out the old values. This technique is implemented in py-ssa-lib as well.

K-Forecasting

The key difference of this forecasting method compared to L-Forecasting is that it makes use of the eigenvectors ${V}^r_{j=0}$. Let $\underline{V_j} \in R^{K-1}$ be the vectors consisting of all coordinates of the $V_{j}$ excluding the coordinates with indices $Kp$, where $p=0,...,s$. Then we introduce a vector $Z^i$. Each vector comprises last $K-1$ values for a given time series.

$$Z^i = \begin{bmatrix} f^i_{N-K+1} & \dots & f^i_{N-1} \\\\ \end{bmatrix}^T$$

The flattened vectors $Z^i$ form a vertically stacked vector $Z$ of $s$ time series

$$Z = \begin{bmatrix} Z^0 \\\\ Z^1\\\\ Z^2 \\\\ \vdots \\\\ Z^s \\\\ \end{bmatrix}$$

Also, denote the chosen signal space as a matrix

$$Q=(\underline{V_0}, \underline{V_1}, ..., \underline{V_r})$$

and collect the coordinates $\pi^p_i$ with indices $Kp$, where $p=0,...,s$ and $i=0,...,r$ of the given signal space in

$$W = \begin{bmatrix} \pi^0_0 & \pi^0_1 &\dots &\pi^0_r \\\\ \pi^1_0 & \pi^1_1 &\dots &\pi^1_r \\\\ \vdots & \vdots & \ddots & \vdots \\\\ \pi^s_0 & \pi^s_1 &\dots &\pi^s_r \\\\ \end{bmatrix}$$

In a multidimensional case(MSSA) it is necessary to assure prior to forecasting that $(I_{ss} - WW^T)^{-1}$ is invertible and $r\leq(K-1)$, where $W$ is a matrix of the size $s \times r$ and $I_{ss}$ is an $s \times s$ identity matrix with $s$ denoting the number of time series in $Z$. If these conditions are satisfied then the K-Forecasting formula holds,

$$\vec{Z}_{N+1} = (I_{ss} - WW^T)^{-1}WQ^TZ$$

The resulting vector $\vec{Z}_{N+1}$ contains $s$ values for every time series.

The forecasting values for $M$ steps ahead is very similar to the one in the L-Forecasting, one should slide a window along the horizontal axis of $Z$, including recently predicted values $N+1...M$ into the matrix $Z$ and leaving out the old values.