Provable Benefits of Complex Parameterizations for Structured State Space Models

Authors: Yuval Ran-Milo, Eden Lumbroso, Edo Cohen-Karlik, Raja Giryes, Amir Globerson, Nadav Cohen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theory is corroborated by controlled experiments, demonstrating that complex parameterizations for SSMs significantly improve performance. We also evaluate SSMs with selectivity a new architectural feature yielding state of the art performance [20, 31, 4, 57]. Our experiments with selectivity portray a more nuanced picture: complex parameterizations are beneficial for some tasks, whereas for others, selectivity allows real parameterizations to achieve comparable (and in some cases better) performance. These findings align with the mixed evidence reported in the literature.
Researcher Affiliation Collaboration Yuval Ran-Milo I Eden Lumbroso I Edo Cohen-Karlik I Raja Giryes I Amir Globerson I II Nadav Cohen I ITel Aviv University IIGoogle.
Pseudocode No The paper does not contain any pseudocode or algorithm blocks. It describes mathematical derivations and experimental procedures in text.
Open Source Code Yes Code for reproducing our experiments is available at https://github.com/edenlum/SSMComplex Param Benefits.
Open Datasets Yes To empirically demonstrate the benefits of complex parameterizations for SSMs in settings beyond our theory, we evaluated the prominent S4 neural network architecture [21] on the real-world sequential CIFAR-10 dataset from the widely recognized Long Range Arena benchmark [52].
Dataset Splits No The paper discusses training and evaluation but does not specify explicit validation splits (e.g., percentages or counts for training, validation, and test sets). It describes how synthetic tasks' data is generated and for CIFAR-10 refers to the standard dataset without explicit split details.
Hardware Specification Yes All experiments were conducted on a single NVIDIA A6000 GPU.
Software Dependencies No The paper mentions basing implementations on official S4 and Mamba repositories and references PyTorch documentation (2023) in the bibliography, but it does not specify concrete version numbers for software dependencies such as PyTorch, CUDA, or other libraries used for the experiments.
Experiment Setup Yes For real SSMs, we performed a grid search for each optimizer, varying learning rates and initialization schemes. Namely, we evaluated learning rates of 1 10 4, 1 10 5 and 1 10 6, and randomly initialized the diagonal elements of AR uniformly in [ 1, 1] or in [ 1, 0.99] [0.99, 1]. For complex SSMs, we used a learning rate of 1 10 5 and initialized the diagonal elements of AC similarly to [41], by sampling uniformly from the complex ring with radii 0.99 to 1. For all SSMs, we employed a cosine learning rate scheduler [35] and trained for half a million steps.