Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Block-Biased Mamba for Long-Range Sequence Processing

Authors: Annan Yu, N. Benjamin Erichson

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical results show how Mamba falls short in each of these aspects compared to earlier SSMs such as S4D. To address these issues, we propose B2S6... Empirically, B2S6 outperforms S4 and S4D on Long-Range Arena (LRA) tasks while maintaining Mamba s performance on language modeling benchmarks. 7 Experiments
Researcher Affiliation	Academia	Annan Yu Center for Applied Mathematics Cornell University Ithaca, NY 14853 EMAIL N. Benjamin Erichson Lawrence Berkeley National Laboratory International Computer Science Institute Berkeley, CA 94720 EMAIL
Pseudocode	Yes	The pseudocode for B2S6 is given in Algorithm 2, which compares to Algorithm 1 for S6 found in the Mamba paper [25], where s ( ) = Broadcastd(Linear1( )) and js ( ) = Broadcastp(Linear1( )) for every 1 j h. Algorithm 1 S6 Forward Pass Algorithm 2 B2S6 Forward Pass
Open Source Code	Yes	We provide anonymous code with our submission. They come along with clear instructions, allowing for faithful reproducibility.
Open Datasets	Yes	Empirically, B2S6 outperforms S4 and S4D on Long-Range Arena (LRA) tasks while maintaining Mamba s performance on language modeling benchmarks. Furthermore, we show that B2S6 achieves comparable perplexity to Mamba when trained on language tasks, demonstrating its versatility. A preliminary examination on the Slim Pajama dataset [78] shows that B2S6 matches Mamba s performance on language modeling tasks, demonstrating its versatility. Table 2: Ablation study of our B2S6 model. We train a model to learn the s CIFAR-10 task
Dataset Splits	Yes	On the LRA benchmark, B2S6 resurrects Mamba from failure and even outperforms S4 and S4D. Table 5: Configurations of our B2S6 model on the LRA benchmark, where LR, BS, and WD stand for learning rate, batch size, and weight decay, respectively.
Hardware Specification	Yes	In this paper, all models are trained with one or more NVIDIA L40 GPUs with 48GB of memory.
Software Dependencies	No	Our implementation of B2S6 is based entirely on PyTorch.
Experiment Setup	Yes	We provide the details of the model and training hyperparameters used for training each LRA task in Table 5. Table 5: Configurations of our B2S6 model on the LRA benchmark, where LR, BS, and WD stand for learning rate, batch size, and weight decay, respectively. Task Depth #Features Norm Prenorm LR BS Epochs WD