Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Uncovering the Spectral Bias in Diagonal State Space Models

Authors: Ruben Solozabal, Velibor Bojkovic, Hilal AlQuabeh, Kentaro Inui, Martin Takac

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental evaluation proceeds as follows. First, we introduce a motivating example in the Continuous Copying task. Then, we utilize s CIFAR to probe the inductive biases that SSMs exhibit when learning on serialized image data. Finally, we demonstrate the benefits of our S4D-DFou T initialization across the Long Range Arena benchmark [17], and further ablation datasets as the Speech Commands dataset [29]. Details of the experimental settings are provided in the Appendix C.
Researcher Affiliation	Academia	Ruben Solozabal MBZUAI EMAIL Velibor Bojkovic MBZUAI EMAIL Hilal Al Quabeh MBZUAI, RIKEN AIP EMAIL Kentaro Inui MBZUAI, RIKEN AIP EMAIL Martin Takáˇc MBZUAI EMAIL
Pseudocode	No	The paper includes mathematical formulations and proofs (e.g., Proposition 1 and its proof) but does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The codebase upon which the experimental part was built is also publicly available, while the custom part of our code is added in the supplementary material.
Open Datasets	Yes	Our experimental evaluation proceeds as follows... Long Range Arena benchmark [17], and further ablation datasets as the Speech Commands dataset [29]. ... The serialized CIFAR-10 (s CIFAR) dataset... The BIDMC dataset [32] consists of continuous physiological signals...
Dataset Splits	No	The paper mentions using well-known benchmarks and datasets like LRA, s CIFAR, Speech Commands, and BIDMC. It describes data preprocessing steps such as padding sequences to maximum lengths (e.g., List Ops to 2048, Text to 4096) and standardization, but does not explicitly state the specific training, validation, and test splits (e.g., percentages or sample counts) used for these datasets within the paper.
Hardware Specification	No	We provide general framework (GPUs used in the experiments) in Appendix C, but we do not report the running time and memory consumption during training.
Software Dependencies	No	The codebase upon which the experimental part was built is also publicly available, while the custom part of our code is added in the supplementary material. Original S4 sourcecode from https://github.com/state-spaces/s4 is under Apache-2.0 license.
Experiment Setup	Yes	The S4D-DFou T hyperparameter configuration we adopt in the experimentation is provided in Table 5. Table 5: Hyperparameters used for the S4D-DFou T reported results. L denotes the number of layers; H, the embedding size; N, the hidden dimension; Dropout, the dropout rate; Lr, the global learning rate; Bs, the batch size; Epochs, the maximum number of training epochs; WD, weight decay; and (ξmin,ξmax), the range of decay rate values.