Simplified State Space Layers for Sequence Modeling
Authors: Jimmy T.H. Smith, Andrew Warrington, Scott Linderman
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now compare empirically the performance of the S5 layer to the S4 layer and other baseline methods. |
| Researcher Affiliation | Academia | 1Institute for Computational and Mathematical Engineering, Stanford University. 2Wu Tsai Neurosciences Institute, Stanford University. 3Department of Statistics, Stanford University. |
| Pseudocode | Yes | Listing 1: JAX implementation to apply a single S5 layer to a batch of input sequences. |
| Open Source Code | Yes | The full S5 implementation is available at: https://github.com/lindermanlab/S5. |
| Open Datasets | Yes | The long range arena (LRA) benchmark (Tay et al., 2021) is a suite of six sequence modeling tasks... |
| Dataset Splits | Yes | There are 96,000 training sequences, 2,000 validation sequences, and 2,000 test sequences. |
| Hardware Specification | Yes | All comparisons were made using a 16GB NVIDIA V100 GPU. |
| Software Dependencies | No | The paper mentions using JAX but does not provide specific version numbers for JAX or other software libraries. |
| Experiment Setup | Yes | Table 11 presents the main hyperparameters used for each experiment. Depth: number of layers. H: number of input/output features. P: Latent size. J: number of blocks used for the initialization of A (see Section B.1.1). Dropout: dropout rate. LR: global learning rate. SSM LR: the SSM learning rate. B: batch size. Epochs: max epochs set for the run. WD: weight decay. |