reproducibilityindex.ai

Transformer Quality in Linear Time

Authors: Weizhe Hua, Zihang Dai, Hanxiao Liu, Quoc Le

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments to demonstrate the efﬁcacy of FLASH over a variety of tasks (masked and autoregressive language modeling), datasets (C4, Wiki-40B, PG19) and model scales (110M to 500M).
Researcher Affiliation	Collaboration	1Cornell University 2Google Research, Brain Team.
Pseudocode	Yes	Figure 2: (c) Pseudocode for Gated Attention Unit. Code 1: Pseudocode for mixed chunk attention.
Open Source Code	No	The paper provides pseudocode but does not explicitly state that the source code for the methodology is openly available or provide a link to a repository.
Open Datasets	Yes	We pretrain and evaluate all models on the C4 dataset (Raffel et al., 2020). ... For auto-regressive language modeling, we focus on the Wiki-40B (Guo et al., 2020) and PG-19 (Rae et al., 2019) datasets...
Dataset Splits	No	The paper refers to 'validation-set results' in figure captions and discusses model training, but it does not explicitly provide the specific percentages or counts for training, validation, and test splits needed for reproduction.
Hardware Specification	Yes	Figure 1: TPU-v4 training speedup of FLASH... The training speed of each model (i.e., training latency per step) is measured with 64 TPU-v4 cores... using a single Nvidia Tesla V100 GPU
Software Dependencies	No	The paper mentions 'Tensor Flow Proﬁler' and includes TensorFlow in its pseudocode, but it does not specify version numbers for any software dependencies.
Experiment Setup	Yes	Appendix B.1. Hyperparameters. ...Table 6: Hyperparameters for MLM pretraining on C4. ...Table 7: Hyperparameters for LM pretraining on Wiki-40B and PG-19. ... These tables specify details such as 'Tokens per batch', 'Batch size', 'Number of steps', 'Warmup steps', 'Peak learning rate', 'Optimizer', 'Weight decay', 'Dropout', and 'Chunk size'.