Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Block-Biased Mamba for Long-Range Sequence Processing
Authors: Annan Yu, N. Benjamin Erichson
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theoretical results show how Mamba falls short in each of these aspects compared to earlier SSMs such as S4D. To address these issues, we propose B2S6... Empirically, B2S6 outperforms S4 and S4D on Long-Range Arena (LRA) tasks while maintaining Mamba s performance on language modeling benchmarks. 7 Experiments |
| Researcher Affiliation | Academia | Annan Yu Center for Applied Mathematics Cornell University Ithaca, NY 14853 EMAIL N. Benjamin Erichson Lawrence Berkeley National Laboratory International Computer Science Institute Berkeley, CA 94720 EMAIL |
| Pseudocode | Yes | The pseudocode for B2S6 is given in Algorithm 2, which compares to Algorithm 1 for S6 found in the Mamba paper [25], where s ( ) = Broadcastd(Linear1( )) and js ( ) = Broadcastp(Linear1( )) for every 1 j h. Algorithm 1 S6 Forward Pass Algorithm 2 B2S6 Forward Pass |
| Open Source Code | Yes | We provide anonymous code with our submission. They come along with clear instructions, allowing for faithful reproducibility. |
| Open Datasets | Yes | Empirically, B2S6 outperforms S4 and S4D on Long-Range Arena (LRA) tasks while maintaining Mamba s performance on language modeling benchmarks. Furthermore, we show that B2S6 achieves comparable perplexity to Mamba when trained on language tasks, demonstrating its versatility. A preliminary examination on the Slim Pajama dataset [78] shows that B2S6 matches Mamba s performance on language modeling tasks, demonstrating its versatility. Table 2: Ablation study of our B2S6 model. We train a model to learn the s CIFAR-10 task |
| Dataset Splits | Yes | On the LRA benchmark, B2S6 resurrects Mamba from failure and even outperforms S4 and S4D. Table 5: Configurations of our B2S6 model on the LRA benchmark, where LR, BS, and WD stand for learning rate, batch size, and weight decay, respectively. |
| Hardware Specification | Yes | In this paper, all models are trained with one or more NVIDIA L40 GPUs with 48GB of memory. |
| Software Dependencies | No | Our implementation of B2S6 is based entirely on PyTorch. |
| Experiment Setup | Yes | We provide the details of the model and training hyperparameters used for training each LRA task in Table 5. Table 5: Configurations of our B2S6 model on the LRA benchmark, where LR, BS, and WD stand for learning rate, batch size, and weight decay, respectively. Task Depth #Features Norm Prenorm LR BS Epochs WD |