Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Transformer Quality in Linear Time
Authors: Weizhe Hua, Zihang Dai, Hanxiao Liu, Quoc Le
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to demonstrate the efficacy of FLASH over a variety of tasks (masked and autoregressive language modeling), datasets (C4, Wiki-40B, PG19) and model scales (110M to 500M). |
| Researcher Affiliation | Collaboration | 1Cornell University 2Google Research, Brain Team. |
| Pseudocode | Yes | Figure 2: (c) Pseudocode for Gated Attention Unit. Code 1: Pseudocode for mixed chunk attention. |
| Open Source Code | No | The paper provides pseudocode but does not explicitly state that the source code for the methodology is openly available or provide a link to a repository. |
| Open Datasets | Yes | We pretrain and evaluate all models on the C4 dataset (Raffel et al., 2020). ... For auto-regressive language modeling, we focus on the Wiki-40B (Guo et al., 2020) and PG-19 (Rae et al., 2019) datasets... |
| Dataset Splits | No | The paper refers to 'validation-set results' in figure captions and discusses model training, but it does not explicitly provide the specific percentages or counts for training, validation, and test splits needed for reproduction. |
| Hardware Specification | Yes | Figure 1: TPU-v4 training speedup of FLASH... The training speed of each model (i.e., training latency per step) is measured with 64 TPU-v4 cores... using a single Nvidia Tesla V100 GPU |
| Software Dependencies | No | The paper mentions 'Tensor Flow Profiler' and includes TensorFlow in its pseudocode, but it does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | Appendix B.1. Hyperparameters. ...Table 6: Hyperparameters for MLM pretraining on C4. ...Table 7: Hyperparameters for LM pretraining on Wiki-40B and PG-19. ... These tables specify details such as 'Tokens per batch', 'Batch size', 'Number of steps', 'Warmup steps', 'Peak learning rate', 'Optimizer', 'Weight decay', 'Dropout', and 'Chunk size'. |