Convolutional State Space Models for Long-Range Spatiotemporal Modeling
Authors: Jimmy Smith, Shalini De Mello, Jan Kautz, Scott Linderman, Wonmin Byeon
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5.1, we present a long-horizon Moving-MNIST experiment to compare Conv RNNs, Transformers and Conv S5 directly. In Section 5.2, we evaluate Conv S5 on the challenging 3D environment benchmarks proposed in Yan et al. [13]. Finally, in Section 5.3, we discuss ablations that highlight the importance of Conv S5 s parameterization. |
| Researcher Affiliation | Collaboration | Jimmy T.H. Smith*,2,4, Shalini De Mello1, Jan Kautz1, Scott W. Linderman3, 4, Wonmin Byeon1 1NVIDIA, *Work performed during internship at NVIDIA 2Institute for Computational and Mathematical Engineering, Stanford University. 3Department of Statistics, Stanford University. 4Wu Tsai Neurosciences Institute, Stanford University. {jsmith14,scott.linderman}@stanford.edu {shalinig,jkautz,wbyeon}@nvidia.com. |
| Pseudocode | Yes | Listing 1: JAX implementation of core code to apply a single Conv S5 layer to a batch of spatiotemporal input sequences. |
| Open Source Code | Yes | 1Implementation available at: https://github.com/NVlabs/Conv SSM. |
| Open Datasets | Yes | We develop a long-horizon Moving-MNIST [54] prediction task... The DMLab long-range benchmark designed by Yan et al. [13] using the Deep Mind Lab (DMLab) [99] simulator... We use the Minecraft [100] long-range benchmark... We use the Habitat long-range benchmark designed by Yan et al. [13] using the Habitat simulator [101]. |
| Dataset Splits | No | The paper describes training procedures and evaluation settings and states that models were trained with different learning rates and the best run was chosen (implying a validation process), but it does not explicitly detail training/validation/test dataset splits with specific percentages, counts, or a citation to a standard validation split. |
| Hardware Specification | Yes | All models were trained with 32GB NVIDIA V100 GPUs. For Moving-MNIST, models were trained with 8 V100s. For all other experiments, models were trained with 16 V100s. |
| Software Dependencies | No | The paper mentions the use of JAX in its implementation (Listing 1), but it does not specify version numbers for JAX or other key software components like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | For Conv S5 and Conv LSTM, we fixed the hidden dimensions (layer input/output features) and state sizes to be 256, and we swept over the following learning rates [1 10 4, 5 10 4, 1 10 3] and chose the best model. ... See Tables 12-14 for detailed experiment configurations. |