Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
You Only Cache Once: Decoder-Decoder Architectures for Language Models
Authors: Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that YOCO achieves favorable performance compared to Transformer in various settings of scaling up model size and number of training tokens. |
| Researcher Affiliation | Collaboration | Tsinghua University Microsoft Research |
| Pseudocode | Yes | C Pseudo Code of Gated Retention |
| Open Source Code | No | Code will be released in camera-ready version. |
| Open Datasets | No | The curated training corpus is similar to [39]. |
| Dataset Splits | Yes | Results Figure 3 reports the validation loss with various parameter counts. |
| Hardware Specification | Yes | The experiments are conducted with H100-80GB GPU cards. |
| Software Dependencies | No | The paper states 'We implement a Triton [36] kernel for gated retention.', but does not provide version numbers for Triton or other software dependencies. |
| Experiment Setup | Yes | Detailed hyperparameters are described in Appendix D. |