The Efficiency Misnomer
Authors: Mostafa Dehghani, Yi Tay, Anurag Arnab, Lucas Beyer, Ashish Vaswani
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experiments where comparing model efficiency strongly depends on the choice of cost indicator, like scenarios where there is parameter sharing, sparsity, or parallelizable operations in the model. |
| Researcher Affiliation | Industry | Google Research {dehghani, aarnab, lbeyer, avaswani, yitay}@google.com |
| Pseudocode | No | The paper describes its methods in narrative text and does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using existing libraries like Scenic and timm for experiments but does not provide a link or explicit statement about releasing the source code for its own methodology. |
| Open Datasets | Yes | Figure 5: Accuracy and value of different cost indicators for different models on Image Net dataset. Figure 2: The learning progress of a Res Net-101 3 on JFT-300M with short and long schedules, obtained from (Kolesnikov et al., 2020). |
| Dataset Splits | No | The paper discusses experiments on datasets like JFT-300M and ImageNet and mentions keeping hyperparameters fixed from referenced papers, but does not explicitly state specific training/validation/test dataset splits. |
| Hardware Specification | Yes | Experiments and the computation of cost metrics were done with Mesh Tensorflow (Shazeer et al., 2018), using 64 TPU-V3. (Figure 1 caption) / throughput is measured on a V100 GPU (Figure 5 caption). |
| Software Dependencies | No | The paper mentions software frameworks like Mesh Tensorflow, Scenic, timm, JAX, PyTorch, and TensorFlow, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | Note that when changing depth or width of the model (see Table 2 in Appendix A for the exact configurations) all other hyper-parameters are kept fixed based on the default values given by the referenced papers. |