ContiFormer: Continuous-Time Transformer for Irregular Time Series Modeling

Authors: Yuqi Chen, Kan Ren, Yansen Wang, Yuchen Fang, Weiwei Sun, Dongsheng Li

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental A wide range of experiments on both synthetic and real-world datasets have illustrated the superior modeling capacities and prediction performance of Conti Former on irregular time series data. In this section, we evaluate Conti Former on three types of tasks on irregular time series data, i.e., interpolation and extrapolation, classification, event prediction, and forecasting.
Researcher Affiliation Collaboration Yuqi Chen1,2 , Kan Ren2 , Yansen Wang2, Yuchen Fang2,3, Weiwei Sun1, Dongsheng Li2 1 School of Computer Science & Shanghai Key Laboratory of Data Science, Fudan University 2 Microsoft Research Asia, 3 Shanghai Jiao Tong University
Pseudocode Yes The generation process is shown in Algorithm 1.
Open Source Code Yes The project link is https://seqml.github.io/contiformer/.
Open Datasets Yes We select 20 datasets from UEA Time Series Classification Archive [3] with diverse characteristics... We use one synthetic dataset and five real-world datasets, namely Synthetic, Neonate [50], Traffic [32], MIMIC [16], Book Order [16] and Stack Overflow [33] to evaluate our model.
Dataset Splits Yes We generate 300 spirals and 200/100 spirals are used for training/testing respectively. We use the 4-fold cross-validation scheme for Synthetic, Neonate, and Traffic datasets following [17], and the 5-fold cross-validation scheme for the other three datasets following [39, 64].
Hardware Specification Yes All the experiments were carried out on a single 16GB NVIDIA Tesla V100 GPU.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python, PyTorch) are provided, only the algorithm name 'Runge-Kutta-4 [44] (RK4) algorithm'.
Experiment Setup Yes By default, we use the natural cubic spline to construct the continuous-time query function. The vector field in ODE is defined as f(t, x) = Actfn(LN(Lineard,d(Lineard,d(x) + Linear1,d(t)))), where Actfn( ) is either tanh or sigmoid activation function, Lineara,b( ) : Ra Rb is a linear transformation from dimension a to dimension b, LN denotes the layer normalization. We adopt the Gauss-Legendre Quadrature approximation to implement Eq. (9). In the experiment, we choose the Runge-Kutta-4 [44] (RK4) algorithm to solve the ODE with a fixed step of fourth order and a step size of 0.1. For both our model and the baseline models, we adopted a fixed learning rate of 10-2 and a batch size of 64. The training process for all models lasted for 1000 epochs.