Simplified State Space Layers for Sequence Modeling

Authors: Jimmy T.H. Smith, Andrew Warrington, Scott Linderman

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now compare empirically the performance of the S5 layer to the S4 layer and other baseline methods.
Researcher Affiliation Academia 1Institute for Computational and Mathematical Engineering, Stanford University. 2Wu Tsai Neurosciences Institute, Stanford University. 3Department of Statistics, Stanford University.
Pseudocode Yes Listing 1: JAX implementation to apply a single S5 layer to a batch of input sequences.
Open Source Code Yes The full S5 implementation is available at: https://github.com/lindermanlab/S5.
Open Datasets Yes The long range arena (LRA) benchmark (Tay et al., 2021) is a suite of six sequence modeling tasks...
Dataset Splits Yes There are 96,000 training sequences, 2,000 validation sequences, and 2,000 test sequences.
Hardware Specification Yes All comparisons were made using a 16GB NVIDIA V100 GPU.
Software Dependencies No The paper mentions using JAX but does not provide specific version numbers for JAX or other software libraries.
Experiment Setup Yes Table 11 presents the main hyperparameters used for each experiment. Depth: number of layers. H: number of input/output features. P: Latent size. J: number of blocks used for the initialization of A (see Section B.1.1). Dropout: dropout rate. LR: global learning rate. SSM LR: the SSM learning rate. B: batch size. Epochs: max epochs set for the run. WD: weight decay.