Robustifying State-space Models for Long Sequences via Approximate Diagonalization

Authors: Annan Yu, Arnur Nigmetov, Dmitriy Morozov, Michael W. Mahoney, N. Benjamin Erichson

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present empirical evaluations of our proposed S4-PTD and S5-PTD models. In section 5.1 we compare the performance of our full model with the existing ones in the Long Range Arena (LRA). In section 5.2, we perform a sensitivity analysis using the CIFAR-10 dataset to provide real-world evidence that our perturbed initialization scheme is more robust than the one in the S4D/S5 model. Finally, in section 5.3, we study the relationship between the size of the perturbation matrix E and the performance of our models.
Researcher Affiliation Academia Annan Yu,1 Arnur Nigmetov,2 Dmitriy Morozov,2 Michael W. Mahoney,2,3,4 N. Benjamin Erichson2,3 1 Center for Applied Mathematics, Cornell University, Ithaca, NY 14853, USA 2 Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA 3 International Computer Science Institute, Berkeley, CA 94704, USA 4 Department of Statistics, University of California at Berkeley, Berkeley, CA 94720, USA
Pseudocode No The paper describes methods and procedures but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets Yes Over the past few years, the new class of state-space models (SSMs) gained vast popularity for sequential modeling due to their outstanding performance on the Long-Range Arena (LRA) dataset (Tay et al., 2021).
Dataset Splits No The paper mentions using standard datasets like LRA and CIFAR-10, which have predefined splits, but it does not explicitly state the train/validation/test split percentages or sample counts used in their specific experiments.
Hardware Specification No The paper mentions using the 'Lawrencium computational cluster' and 'National Energy Research Scientific Computing Center (NERSC)' but does not provide specific hardware details such as GPU or CPU models.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes We provide the detailed configuration of our S4-PTD model in Table 4 and that of our S5-PTD model in Table 5. In particular, we note that the first two columns of Table 4 are almost the same as those in Gu et al. (2022a)9 and the first four columns of Table 5 match those in Smith et al. (2023) these are model parameters. The only remaining non-trivial thing is that in the Path-X task, we start with a batch size of 32. We half the batch size after epoch 30 and epoch 60. By making the batch size smaller, we improve the generalization power of our model.