SCINet

PAPER: Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction

Motivations

For time series forecasting tasks, we have applied many kinds of deep neural networks used for sequence modeling such as RNNs, Transformer and TCN, which are illustrated as below. Different from other types of sequence data, the information density of time series is lower than words or patches. Thus, we have to learn intrinsic features of time series like trend and seasonality to capture more useful information and enhance the explainability of the model. However, the sampling rate is also an important factor in signal processing. The downsampling of time series data often preserves most of the information. This paper proposes a sample convolution and interaction network, named SCINet, to utilize different downsamplings of time series data and iteratively extract hierarchical temporal information.

SCINet

SCINet is a hierarchical framework that enhances the predictability of the original time series by capturing temporal dependencies at multiple temporal resolutions, which is shown as follows.

The SCI-Block is the basic module of the SCINet, which splits the input time series $x$ into two sub-series $x_{\text{odd}}$ and $x_{\text{even}}$ and extracts both homogeneous and heterogeneous information through interactive learning with a set of distinct convolutional filters. Compared to the dilated convolutions used in the TCN architecture, SCI-Block achieves a larger receptive field and aggregates more essential information extracted from four distinct filters.

\[x_{\text{odd}}^s=x_{\text{odd}}\odot\exp(\phi(x_\text{even}))\qquad x_{\text{even}}^s=x_{\text{even}}\odot\exp(\psi(x_\text{odd}))\] \[\acute{x}_{\text{odd}}=x_{\text{odd}}^s+\rho(x_{\text{even}}^s)\qquad \acute{x}_{\text{even}}=x_{\text{even}}^s+\eta(x_{\text{odd}}^s)\]

class SCIBlock(nn.Module):
    def __init__(self, in_planes, kernel_size=3, dilation=1, dropout=0.5, hidden_size=64):
        super(SCIBlock, self).__init__()        
        pad_l = dilation * (kernel_size - 1) // 2 + 1 if kernel_size % 2 != 0 else dilation * (kernel_size - 2) // 2 + 1
        pad_r = dilation * (kernel_size - 1) // 2 + 1 if kernel_size % 2 != 0 else dilation * (kernel_size) // 2 + 1

        self.phi = nn.Sequential(
            nn.ReplicationPad1d((pad_l, pad_r)),
            nn.Conv1d(in_planes, hidden_size, kernel_size=kernel_size, dilation=dilation),
            nn.LeakyReLU(negative_slope=0.01, inplace=True),
            nn.Dropout(dropout),
            nn.Conv1d(hidden_size, in_planes, kernel_size=3),
            nn.Tanh()
        )
        self.psi = nn.Sequential(
            nn.ReplicationPad1d((pad_l, pad_r)),
            nn.Conv1d(in_planes, hidden_size, kernel_size=kernel_size, dilation=dilation),
            nn.LeakyReLU(negative_slope=0.01, inplace=True),
            nn.Dropout(dropout),
            nn.Conv1d(hidden_size, in_planes, kernel_size=3),
            nn.Tanh()
        )
        self.rho = nn.Sequential(
            nn.ReplicationPad1d((pad_l, pad_r)),
            nn.Conv1d(in_planes, hidden_size, kernel_size=kernel_size, dilation=dilation),
            nn.LeakyReLU(negative_slope=0.01, inplace=True),
            nn.Dropout(dropout),
            nn.Conv1d(hidden_size, in_planes, kernel_size=3),
            nn.Tanh()
        )
        self.eta = nn.Sequential(
            nn.ReplicationPad1d((pad_l, pad_r)),
            nn.Conv1d(in_planes, hidden_size, kernel_size=kernel_size, dilation=dilation),
            nn.LeakyReLU(negative_slope=0.01, inplace=True),
            nn.Dropout(dropout),
            nn.Conv1d(hidden_size, in_planes, kernel_size=3),
            nn.Tanh()
        )

    def forward(self, x):
        x_even = x[:, ::2, :].transpose(1, 2)
        x_odd = x[:, 1::2, :].transpose(1, 2)

        x_odd_s = x_odd.mul(torch.exp(self.phi(x_even)))
        x_even_s = x_even.mul(torch.exp(self.psi(x_odd)))

        x_even_update = x_even_s + self.eta(x_odd_s)
        x_odd_update = x_odd_s + self.rho(x_even_s)

        return x_even_update.transpose(1, 2), x_odd_update.transpose(1, 2)

SCINet is constructed by arranging multiple SCI-Blocks hierarchically and presents as a binary tree structure. Stacking more layers of SCINets could achieve better prediction accuracy. Experiments show that SCINet outperforms on several datasets compared to other CNN-based methods.

class SCITree(nn.Module):
    def __init__(self, level, in_planes, kernel_size=3, dilation=1, dropout=0.5, hidden_size=64):
        super(SCITree, self).__init__()
        self.level = level
        self.block = SCIBlock(
            in_planes=in_planes,
            kernel_size=kernel_size,
            dropout=dropout,
            dilation=dilation,
            hidden_size=hidden_size,
        )
        if level != 0:
            self.SCINet_odd = SCITree(level - 1, in_planes, kernel_size, dilation, dropout, hidden_size)
            self.SCINet_even = SCITree(level - 1, in_planes, kernel_size, dilation, dropout, hidden_size)
    
    def zip_up_the_pants(self, even, odd):
        assert even.shape[1] == odd.shape[1]

        even = even.transpose(0, 1)
        odd = odd.transpose(0, 1)
        merge = []

        for i in range(even.shape[0]):
            merge.append(even[i].unsqueeze(0))
            merge.append(odd[i].unsqueeze(0))

        return torch.cat(merge, 0).transpose(0, 1) # [B, L, D]
        
    def forward(self, x):
        # [B, L, D]
        x_even_update, x_odd_update = self.block(x)

        if self.level == 0:
            return self.zip_up_the_pants(x_even_update, x_odd_update)
        else:
            return self.zip_up_the_pants(self.SCINet_even(x_even_update), self.SCINet_odd(x_odd_update))


class SCINet(nn.Module):
    def __init__(self, input_len, output_len, level, in_planes, kernel_size=3, dilation=1, dropout=0.5, hidden_size=64):
        super(SCINet, self).__init__()
        self.encoder = SCITree(level, in_planes, kernel_size, dilation, dropout, hidden_size)
        self.projection = nn.Conv1d(input_len, output_len, kernel_size=1, stride=1, bias=False)
    
    def forward(self, x):
        res = x
        x = self.encoder(x)
        x += res
        x = self.projection(x)

        return x

Similar as dilated convolutions used in the TCN and other multi-scale or omni-scale 1D-CNN, the size of receptive field plays a vital role in time series modeling. SCINet enlarges its receptive field through multi-resolution downsampling and aggregates these temporal features, which is also a “non-local” strategy. We have to continue research other global modeling structure like Transformer, MLP-Mixer and etc.