Motivations
Prior Transformer based models have tried various self-attention mechanisms to obtain long-range dependencies. However, pointwise attention limits the ability of the model to acquire correlation within time series. (e.g. We encode the patches but not pixels in Vision Transformer) Intuitively, the sub-series at the same phase position among periods often present similar temporal process. Thus, attention or other correlation methods among subsequences may be more reliable. Autoformer introduces auto-correlation mechanism in place of self-attention to discover the dependencies among sub-series.
To avoid the impact of distribution shift caused by trend part in series, Autoformer also tries to disentangle the original time series into more stationary trend and seasonality information as follows.
\[x(t)=\text{Trend}(t)+\text{Season}(t)+\text{Noise}(t)\]Autoformer
The overview of Autoformer’s architecture is as follows, which is similar as Informer. Notice that Autoformer removes the position embedding in original DataEmbedding
and replaces ProbSparseAttention
to AutoCorrelation
. The decomposition module SeriesDecomp
extracts the trend parts from the series through moving average. The encoder actually focuses on the seasonal part modeling, which will be used as the cross information to help the decoder refine prediction results. The final prediction of Autoformer consists of seasonal part and trend part.
1 |
|
To capture periodic dependencies among similar subsequences, Autoformer utilizes autocorrelation coefficient $R(\tau)$ to describe the time-delay similarity between the original time series $x_t$ and its $\tau$ lag series $x_{t-\tau}$.
\[R(\tau)=\lim_{L\rightarrow\infty}\frac{1}{L}\sum_{t=1}^Lx_tx_{t-\tau}=\mathcal{F}^{-1}\mathcal{F}(x_t)\overline{\mathcal{F}(x_t)}\] In essence, $R(\tau)$ tells us all the possible period $\tau$ of the input series $x_t$. Autoformer introduces the time delay aggregation module to fuse top k possible periodic information. The figure below shows how to implement periodic aggregation using torch.roll
and softmax
.
1 |
|
Some prediction results are as follows. Autoformer successfully generates trend and seasonal part from historical series. However, autocorrelation and pointwise self-attention are all conducive to capture seasonality. We have to stress that they are all of the same typical paradigm: similar pattern matching, which is sensitive to trend part of series. We should further consider the distribution shift of the trend part in future.