reproducibilityindex.ai

Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers

Authors: Sukjun Hwang, Aakash Sunil Lahoti, Ratish Puduppully, Tri Dao, Albert Gu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide extensive experimental results that substantiate our claims. Our systematic ablation studies control architectural variables to highlight the impact of matrix parameterization. These careful experiments confirm that Sequence Alignment, a property we newly identified in certain matrix mixers, significantly enhances downstream performance.
Researcher Affiliation	Collaboration	1Machine Learning Department, Carnegie Mellon University 2IT University of Copenhagen 3Department of Computer Science, Princeton University 4Cartesia AI {sukjunh,alahoti}@cs.cmu.edu, rapu@itu.dk, tri@tridao.me, agu@cs.cmu.edu
Pseudocode	Yes	Figure 5: Pseudo code for Hydra. B,L,H,P denote batch size, sequence length, number of heads, and head dimension respectively. The suffices _f and _b denote forward and backward.
Open Source Code	Yes	We publicly release source code at https://github.com/goombalab/hydra.
Open Datasets	Yes	We pretrain our models on the masked language modeling objective using the Colossal Cleaned Common Crawl (C4) corpus [36], then finetune and evaluate them on the GLUE benchmark [43].
Dataset Splits	Yes	We pretrain our models on the masked language modeling objective using the Colossal Cleaned Common Crawl (C4) corpus [36], then finetune and evaluate them on the GLUE benchmark [43].
Hardware Specification	No	This research was made possible by the generous support of computational resources provided by Cartesia AI.
Software Dependencies	No	BERT trained with the latest Hugging Face recipe [46]
Experiment Setup	Yes	The specific hyperparameters for reproducing the results in Table 4 are reported in Table 6, and the settings used for obtaining the results of Hydra in Table 5 are listed in Table 9.