Convolutional State Space Models for Long-Range Spatiotemporal Modeling

Authors: Jimmy Smith, Shalini De Mello, Jan Kautz, Scott Linderman, Wonmin Byeon

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 5.1, we present a long-horizon Moving-MNIST experiment to compare Conv RNNs, Transformers and Conv S5 directly. In Section 5.2, we evaluate Conv S5 on the challenging 3D environment benchmarks proposed in Yan et al. [13]. Finally, in Section 5.3, we discuss ablations that highlight the importance of Conv S5 s parameterization.
Researcher Affiliation Collaboration Jimmy T.H. Smith*,2,4, Shalini De Mello1, Jan Kautz1, Scott W. Linderman3, 4, Wonmin Byeon1 1NVIDIA, *Work performed during internship at NVIDIA 2Institute for Computational and Mathematical Engineering, Stanford University. 3Department of Statistics, Stanford University. 4Wu Tsai Neurosciences Institute, Stanford University. {jsmith14,scott.linderman}@stanford.edu {shalinig,jkautz,wbyeon}@nvidia.com.
Pseudocode Yes Listing 1: JAX implementation of core code to apply a single Conv S5 layer to a batch of spatiotemporal input sequences.
Open Source Code Yes 1Implementation available at: https://github.com/NVlabs/Conv SSM.
Open Datasets Yes We develop a long-horizon Moving-MNIST [54] prediction task... The DMLab long-range benchmark designed by Yan et al. [13] using the Deep Mind Lab (DMLab) [99] simulator... We use the Minecraft [100] long-range benchmark... We use the Habitat long-range benchmark designed by Yan et al. [13] using the Habitat simulator [101].
Dataset Splits No The paper describes training procedures and evaluation settings and states that models were trained with different learning rates and the best run was chosen (implying a validation process), but it does not explicitly detail training/validation/test dataset splits with specific percentages, counts, or a citation to a standard validation split.
Hardware Specification Yes All models were trained with 32GB NVIDIA V100 GPUs. For Moving-MNIST, models were trained with 8 V100s. For all other experiments, models were trained with 16 V100s.
Software Dependencies No The paper mentions the use of JAX in its implementation (Listing 1), but it does not specify version numbers for JAX or other key software components like Python, PyTorch, or CUDA.
Experiment Setup Yes For Conv S5 and Conv LSTM, we fixed the hidden dimensions (layer input/output features) and state sizes to be 256, and we swept over the following learning rates [1 10 4, 5 10 4, 1 10 3] and chose the best model. ... See Tables 12-14 for detailed experiment configurations.