Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers
Authors: Sukjun Hwang, Aakash Sunil Lahoti, Ratish Puduppully, Tri Dao, Albert Gu
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide extensive experimental results that substantiate our claims. Our systematic ablation studies control architectural variables to highlight the impact of matrix parameterization. These careful experiments confirm that Sequence Alignment, a property we newly identified in certain matrix mixers, significantly enhances downstream performance. |
| Researcher Affiliation | Collaboration | 1Machine Learning Department, Carnegie Mellon University 2IT University of Copenhagen 3Department of Computer Science, Princeton University 4Cartesia AI EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Figure 5: Pseudo code for Hydra. B,L,H,P denote batch size, sequence length, number of heads, and head dimension respectively. The suffices _f and _b denote forward and backward. |
| Open Source Code | Yes | We publicly release source code at https://github.com/goombalab/hydra. |
| Open Datasets | Yes | We pretrain our models on the masked language modeling objective using the Colossal Cleaned Common Crawl (C4) corpus [36], then finetune and evaluate them on the GLUE benchmark [43]. |
| Dataset Splits | Yes | We pretrain our models on the masked language modeling objective using the Colossal Cleaned Common Crawl (C4) corpus [36], then finetune and evaluate them on the GLUE benchmark [43]. |
| Hardware Specification | No | This research was made possible by the generous support of computational resources provided by Cartesia AI. |
| Software Dependencies | No | BERT trained with the latest Hugging Face recipe [46] |
| Experiment Setup | Yes | The specific hyperparameters for reproducing the results in Table 4 are reported in Table 6, and the settings used for obtaining the results of Hydra in Table 5 are listed in Table 9. |