reproducibilityindex.ai

Efficiently Modeling Long Sequences with Structured State Spaces

Authors: Albert Gu, Karan Goel, Christopher Re

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	S4 achieves strong empirical results across a diverse range of established benchmarks, including (i) 91% accuracy on sequential CIFAR-10 with no data augmentation or auxiliary losses, on par with a larger 2-D Res Net, (ii) substantially closing the gap to Transformers on image and language modeling tasks, while performing generation 60 faster (iii) SoTA on every task from the Long Range Arena benchmark, including solving the challenging Path-X task of length 16k that all prior work fails on, while being as efficient as all competitors.
Researcher Affiliation	Academia	Albert Gu & Karan Goel & Christopher R e Department of Computer Science, Stanford University {albertgu,krng}@stanford.edu, chrismre@cs.stanford.edu
Pseudocode	Yes	Algorithm 1 S4 CONVOLUTION KERNEL (SKETCH)
Open Source Code	Yes	Code is publicly available at https://github.com/Hazy Research/state-spaces.
Open Datasets	Yes	S4 achieves strong empirical results across a diverse range of established benchmarks, including (i) 91% accuracy on sequential CIFAR-10... (iii) SoTA on every task from the Long Range Arena benchmark... On CIFAR-10 density estimation... On Wiki Text-103 language modeling... Speech Commands dataset (Warden, 2018).
Dataset Splits	Yes	LRA (Tay et al., 2021) contains 6 tasks with lengths 1K-16K steps... Table 9: The values of the best hyperparameters found for classiﬁcation datasets; LRA (Top) and images/speech (Bottom)... Evaluation was performed similarly to the basic setting in (Baevski & Auli, 2018), Table 5, which involves sliding non-overlapping windows of width 1024 tokens.
Hardware Specification	Yes	Benchmarking results from Table 1 and Table 2 were tested on a single A100 GPU. Some tasks used an A100 GPU (notably, the Path-X experiments), which has a larger max memory of 40Gb.
Software Dependencies	No	Our current implementation of S4 actually uses the naive O(NL) algorithm... we leverage the pykeops library for memory-efﬁcient kernel operations. This code was only available in Tensor Flow... which were implemented in Py Torch.
Experiment Setup	Yes	Table 9: The values of the best hyperparameters found for classiﬁcation datasets; LRA (Top) and images/speech (Bottom). LR is learning rate and WD is weight decay. BN and LN refer to Batch Normalization and Layer Normalization.