Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers
Authors: Sukjun Hwang, Aakash Sunil Lahoti, Ratish Puduppully, Tri Dao, Albert Gu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide extensive experimental results that substantiate our claims. Our systematic ablation studies control architectural variables to highlight the impact of matrix parameterization. These careful experiments confirm that Sequence Alignment, a property we newly identified in certain matrix mixers, significantly enhances downstream performance. |
| Researcher Affiliation | Collaboration | 1Machine Learning Department, Carnegie Mellon University 2IT University of Copenhagen 3Department of Computer Science, Princeton University 4Cartesia AI {sukjunh,alahoti}@cs.cmu.edu, rapu@itu.dk, tri@tridao.me, agu@cs.cmu.edu |
| Pseudocode | Yes | Figure 5: Pseudo code for Hydra. B,L,H,P denote batch size, sequence length, number of heads, and head dimension respectively. The suffices _f and _b denote forward and backward. |
| Open Source Code | Yes | We publicly release source code at https://github.com/goombalab/hydra. |
| Open Datasets | Yes | We pretrain our models on the masked language modeling objective using the Colossal Cleaned Common Crawl (C4) corpus [36], then finetune and evaluate them on the GLUE benchmark [43]. |
| Dataset Splits | Yes | We pretrain our models on the masked language modeling objective using the Colossal Cleaned Common Crawl (C4) corpus [36], then finetune and evaluate them on the GLUE benchmark [43]. |
| Hardware Specification | No | This research was made possible by the generous support of computational resources provided by Cartesia AI. |
| Software Dependencies | No | BERT trained with the latest Hugging Face recipe [46] |
| Experiment Setup | Yes | The specific hyperparameters for reproducing the results in Table 4 are reported in Table 6, and the settings used for obtaining the results of Hydra in Table 5 are listed in Table 9. |