reproducibilityindex.ai

Revisiting Neural Scaling Laws in Language and Vision

Authors: Ibrahim M. Alabdulmohsin, Behnam Neyshabur, Xiaohua Zhai

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide an empirical evaluation of the four scaling law estimators in several domains, including image classiﬁcation (72 tasks), neural machine translation (5 tasks), language modeling (5 tasks), and other language-related evaluations (10 tasks).
Researcher Affiliation	Industry	Ibrahim Alabdulmohsin Google Research, Brain Team Zürich, Switzerland ibomohsin@google.com Behnam Neyshabur Google Research, Blueshift Team Mountain View, United States neyshabur@google.com Xiaohua Zhai Google Research, Brain Team Zürich, Switzerland xzhai@google.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks, or clearly labeled algorithm sections.
Open Source Code	Yes	The code and dataset for the remaining tasks used in this evaluation are made publicly available to facilitate further research in this domain3. 3Code and benchmark dataset will be made available at: https://github.com/google-research/ google-research/tree/master/revisiting_neural_scaling_laws.
Open Datasets	No	Some of the datasets used in our experiments are proprietary and cannot be released, such as JFT-300M. We also include experiments on publicly available datasets, such as Big-Bench for reproducibility. The paper mentions JFT-300M as proprietary and while BIG-Bench is mentioned as publicly available, it does not provide concrete access information (specific link, DOI, or repository) to the exact data or version used in their experiments for reproducibility.
Dataset Splits	Yes	In all experiments, we divide the learning curve into two splits: (1) one split used for training the scaling law estimators, and (2) one split used for evaluating extrapolation. Setting τ = xmax/2, where xmax is the maximum value of x in the data, the ﬁrst split is the domain x [0, τ] while the second split is the domain x (τ, 2τ].
Hardware Specification	No	All experiments are executed on Tensor Processing Units (TPUs). The paper specifies 'TPUs' but does not provide specific model numbers or detailed specifications for the hardware used.
Software Dependencies	No	The paper mentions optimizers like 'Adam optimizer [24]' and 'Adafactor optimizer [33]', but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	with a base learning rate of 5e-4, batch-size 4,096, and dropout rate of 0.1. models are trained with the per-token cross-entropy loss using Adafactor optimizer [33] with a batch-size of 500K tokens and a dropout rate of 0.1 [3].