Effectively Modeling Time Series with Simple Discrete State Spaces
Authors: Michael Zhang, Khaled Kamal Saab, Michael Poli, Tri Dao, Karan Goel, Christopher Re
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, our contributions lead to state-of-the-art results on extensive and diverse benchmarks, with best or second-best AUROC on 6 / 7 ECG and speech time series classification, and best MSE on 14 / 16 Informer forecasting tasks. Furthermore, we find SPACETIME (1) fits AR(p) processes that prior deep SSMs fail on, (2) forecasts notably more accurately on longer horizons than prior state-of-the-art, and (3) speeds up training on real-world ETTh1 data by 73% and 80% relative wall-clock time over Transformers and LSTMs. |
| Researcher Affiliation | Academia | Michael Zhang , Khaled Saab , Michael Poli, Tri Dao, Karan Goel & Christopher R e Stanford University mzhang@cs.stanford.edu, ksaab@stanford.edu |
| Pseudocode | Yes | Algorithm 1 Efficient Output Filter F y Computation |
| Open Source Code | Yes | We include code to reproduce our main results in Table 1 in the supplementary material. |
| Open Datasets | Yes | For forecasting, we evaluate SPACETIME on 40 forecasting tasks from the popular Informer (Zhou et al., 2021) and Monash (Godahewa et al., 2021) benchmarks, testing on horizons 8 to 960 time-steps long. For classification, we evaluate SPACETIME on seven medical ECG or speech audio classification tasks, which test on sequences up to 16,000 time-steps long. ...We use the publicly available PTB-XL dataset (Wagner et al., 2020a;b; Goldberger et al., 2000)... |
| Dataset Splits | Yes | We train SPACETIME on all datasets for 50 epochs using Adam W optimizer (Loshchilov and Hutter, 2017), cosine scheduling, and early stopping based on best validation standardized MSE. ...We pick the model based on best validation RMSE performance. |
| Hardware Specification | Yes | All experiments were run on a single NVIDIA Tesla P100 GPU. ... All experiments were run on a single NVIDIA GeForce RTX 3090 GPU. ... For both ECG and Speech Commands, all experiments were run on a single NVIDIA Tesla A100 Ampere 40 GB GPU. |
| Software Dependencies | No | The paper mentions the use of "Adam W optimizer" and "Adam optimizer" but does not specify version numbers for these or any other software libraries, frameworks (like PyTorch or TensorFlow), or programming languages. |
| Experiment Setup | Yes | We train SPACETIME on all datasets for 50 epochs using Adam W optimizer (Loshchilov and Hutter, 2017), cosine scheduling, and early stopping based on best validation standardized MSE. We performed a grid search over number of SSMs {64, 128} and weight decay {0, 0.0001}. ... We train with learning rate 0.01, weight decay 0.0001, batch size 32, and dropout 0.25. ... We optimize SPACETIME on all datasets using Adam optimizer for 40 epochs with a linear learning rate warmup phase of 20 epochs and cosine decay. We initialize learning rate at 0.001, reach 0.004 after warmup, and decay to 0.0001. We do not use weight decay or dropout. We perform a grid search over number of layers {3, 4, 5, 6}, number of SSMs per layer {8, 16, 32, 64, 128} and number of channels (width of the model) {1, 4, 8, 16}. |