reproducibilityindex.ai

Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers

Authors: Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, Christopher Ré

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, stacking LSSL layers into a simple deep neural network obtains state-of-the-art results across time series benchmarks for long dependencies in sequential image classiﬁcation, real-world healthcare regression tasks, and speech. On a difﬁcult speech classiﬁcation task with length-16000 sequences, LSSL outperforms prior approaches by 24 accuracy points, and even outperforms baselines that use handcrafted features on 100x shorter sequences.
Researcher Affiliation	Academia	Department of Computer Science, Stanford University Department of Electrical Engineering, Stanford University Department of Computer Science and Engineering, University at Buffalo, SUNY {albertgu,knrg,ksaab,trid}@stanford.edu, chrismre@cs.stanford.edu {isysjohn,atri}@buffalo.edu
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not include an explicit statement about releasing source code or provide a link to a code repository for the described methodology.
Open Datasets	Yes	We test on the sequential MNIST, permuted MNIST, and sequential CIFAR tasks (Table 1), popular benchmarks which were originally designed to test the ability of recurrent models to capture long-term dependencies of length up to 1k [2]. We additionally use the BDIMC healthcare datasets (Table 2), a suite of widely studied time series regression problems of length 4000 on estimating vital signs. Table 4 reports results for the Speech Commands (SC) dataset [31] for classiﬁcation of 1-second audio clips. we create a challenging new sequential-Celeb A task, where we classify 178 x 218 images = 38000-length sequences for 4 facial attributes: Attractive (Att.), Mouth Slightly Open (MSO), Smiling (Smil.), Wearing Lipstick (WL) [36].
Dataset Splits	No	While the paper mentions the use of datasets for benchmarks, it does not explicitly provide the specific train/validation/test splits (e.g., percentages or sample counts) used for reproducibility.
Hardware Specification	No	The paper mentions 'multi-GPU training' and 'Google Cloud credits' but does not specify any particular GPU models, CPU types, or other detailed hardware specifications used for the experiments.
Software Dependencies	No	The paper does not provide specific software names along with their version numbers (e.g., Python 3.8, PyTorch 1.9) needed for reproducibility.
Experiment Setup	No	The paper mentions that 'Full architecture details are described in Appendix B, including the initialization of A and t, computational details, and other architectural details.' and that 'we did light tuning primarily on learning rate and dropout', but it does not provide the specific hyperparameter values or detailed training configurations in the main text.