Time Series Extension of "Patches Are All You Need"
Implementations of some prototypical time series mixers based on Conv, MLP, and ViT archs. modified for the probabilistic multivariate forecasting use case, where the emission head is currently an "independent same-family" distribution, e.g., diagonal Student-T.
In everything that follows, the inputs are typically 4-Tensors of shape [Batch, Variate-dim, Context-length, 1+Features], and during training, the subsequent prediction window values are given [B, Variate-dim, Pred-length]. The inputs are embedded via 2d-conv to obtain patch embeddings:




