Provable Benefits of Complex Parameterizations for Structured State Space Models
Authors: Yuval Ran-Milo, Eden Lumbroso, Edo Cohen-Karlik, Raja Giryes, Amir Globerson, Nadav Cohen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theory is corroborated by controlled experiments, demonstrating that complex parameterizations for SSMs significantly improve performance. We also evaluate SSMs with selectivity a new architectural feature yielding state of the art performance [20, 31, 4, 57]. Our experiments with selectivity portray a more nuanced picture: complex parameterizations are beneficial for some tasks, whereas for others, selectivity allows real parameterizations to achieve comparable (and in some cases better) performance. These findings align with the mixed evidence reported in the literature. |
| Researcher Affiliation | Collaboration | Yuval Ran-Milo I Eden Lumbroso I Edo Cohen-Karlik I Raja Giryes I Amir Globerson I II Nadav Cohen I ITel Aviv University IIGoogle. |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. It describes mathematical derivations and experimental procedures in text. |
| Open Source Code | Yes | Code for reproducing our experiments is available at https://github.com/edenlum/SSMComplex Param Benefits. |
| Open Datasets | Yes | To empirically demonstrate the benefits of complex parameterizations for SSMs in settings beyond our theory, we evaluated the prominent S4 neural network architecture [21] on the real-world sequential CIFAR-10 dataset from the widely recognized Long Range Arena benchmark [52]. |
| Dataset Splits | No | The paper discusses training and evaluation but does not specify explicit validation splits (e.g., percentages or counts for training, validation, and test sets). It describes how synthetic tasks' data is generated and for CIFAR-10 refers to the standard dataset without explicit split details. |
| Hardware Specification | Yes | All experiments were conducted on a single NVIDIA A6000 GPU. |
| Software Dependencies | No | The paper mentions basing implementations on official S4 and Mamba repositories and references PyTorch documentation (2023) in the bibliography, but it does not specify concrete version numbers for software dependencies such as PyTorch, CUDA, or other libraries used for the experiments. |
| Experiment Setup | Yes | For real SSMs, we performed a grid search for each optimizer, varying learning rates and initialization schemes. Namely, we evaluated learning rates of 1 10 4, 1 10 5 and 1 10 6, and randomly initialized the diagonal elements of AR uniformly in [ 1, 1] or in [ 1, 0.99] [0.99, 1]. For complex SSMs, we used a learning rate of 1 10 5 and initialized the diagonal elements of AC similarly to [41], by sampling uniformly from the complex ring with radii 0.99 to 1. For all SSMs, we employed a cosine learning rate scheduler [35] and trained for half a million steps. |