The Efficiency Misnomer

Authors: Mostafa Dehghani, Yi Tay, Anurag Arnab, Lucas Beyer, Ashish Vaswani

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present experiments where comparing model efficiency strongly depends on the choice of cost indicator, like scenarios where there is parameter sharing, sparsity, or parallelizable operations in the model.
Researcher Affiliation Industry Google Research {dehghani, aarnab, lbeyer, avaswani, yitay}@google.com
Pseudocode No The paper describes its methods in narrative text and does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions using existing libraries like Scenic and timm for experiments but does not provide a link or explicit statement about releasing the source code for its own methodology.
Open Datasets Yes Figure 5: Accuracy and value of different cost indicators for different models on Image Net dataset. Figure 2: The learning progress of a Res Net-101 3 on JFT-300M with short and long schedules, obtained from (Kolesnikov et al., 2020).
Dataset Splits No The paper discusses experiments on datasets like JFT-300M and ImageNet and mentions keeping hyperparameters fixed from referenced papers, but does not explicitly state specific training/validation/test dataset splits.
Hardware Specification Yes Experiments and the computation of cost metrics were done with Mesh Tensorflow (Shazeer et al., 2018), using 64 TPU-V3. (Figure 1 caption) / throughput is measured on a V100 GPU (Figure 5 caption).
Software Dependencies No The paper mentions software frameworks like Mesh Tensorflow, Scenic, timm, JAX, PyTorch, and TensorFlow, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes Note that when changing depth or width of the model (see Table 2 in Appendix A for the exact configurations) all other hyper-parameters are kept fixed based on the default values given by the referenced papers.