Robustifying State-space Models for Long Sequences via Approximate Diagonalization
Authors: Annan Yu, Arnur Nigmetov, Dmitriy Morozov, Michael W. Mahoney, N. Benjamin Erichson
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present empirical evaluations of our proposed S4-PTD and S5-PTD models. In section 5.1 we compare the performance of our full model with the existing ones in the Long Range Arena (LRA). In section 5.2, we perform a sensitivity analysis using the CIFAR-10 dataset to provide real-world evidence that our perturbed initialization scheme is more robust than the one in the S4D/S5 model. Finally, in section 5.3, we study the relationship between the size of the perturbation matrix E and the performance of our models. |
| Researcher Affiliation | Academia | Annan Yu,1 Arnur Nigmetov,2 Dmitriy Morozov,2 Michael W. Mahoney,2,3,4 N. Benjamin Erichson2,3 1 Center for Applied Mathematics, Cornell University, Ithaca, NY 14853, USA 2 Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA 3 International Computer Science Institute, Berkeley, CA 94704, USA 4 Department of Statistics, University of California at Berkeley, Berkeley, CA 94720, USA |
| Pseudocode | No | The paper describes methods and procedures but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | Over the past few years, the new class of state-space models (SSMs) gained vast popularity for sequential modeling due to their outstanding performance on the Long-Range Arena (LRA) dataset (Tay et al., 2021). |
| Dataset Splits | No | The paper mentions using standard datasets like LRA and CIFAR-10, which have predefined splits, but it does not explicitly state the train/validation/test split percentages or sample counts used in their specific experiments. |
| Hardware Specification | No | The paper mentions using the 'Lawrencium computational cluster' and 'National Energy Research Scientific Computing Center (NERSC)' but does not provide specific hardware details such as GPU or CPU models. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | We provide the detailed configuration of our S4-PTD model in Table 4 and that of our S5-PTD model in Table 5. In particular, we note that the first two columns of Table 4 are almost the same as those in Gu et al. (2022a)9 and the first four columns of Table 5 match those in Smith et al. (2023) these are model parameters. The only remaining non-trivial thing is that in the Path-X task, we start with a batch size of 32. We half the batch size after epoch 30 and epoch 60. By making the batch size smaller, we improve the generalization power of our model. |