reproducibilityindex.ai

On the SDEs and Scaling Rules for Adaptive Gradient Algorithms

Authors: Sadhika Malladi, Kaifeng Lyu, Abhishek Panigrahi, Sanjeev Arora

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	A key practical result is the derivation of a square root scaling rule to adjust the optimization hyperparameters of RMSprop and Adam when changing batch size, and its empirical validation in deep learning settings.
Researcher Affiliation	Academia	Sadhika Malladi Kaifeng Lyu Abhishek Panigrahi Sanjeev Arora Department of Computer Science Princeton University {smalladi,klyu,ap34,arora}@cs.princeton.edu
Pseudocode	No	The paper describes algorithms like RMSprop, Adam, and SVAG, but does not present them in a formalized 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	We include the code for the vision experiments in the supplementary material. For the NLP experiments, we use the code of Wettig et al. (2022).
Open Datasets	Yes	Figures 1 and 2 show the square root scaling rule applied to Res Net-50 (He et al., 2016) and VGG-16 (Simonyan and Zisserman, 2014) trained on CIFAR-10 (Krizhevsky et al.), Ro BERTa-large (Liu et al., 2019) trained on the Wiki+Books corpus (Zhu et al., 2015), 12-layer GPT (Brown et al., 2020) on Wiki Text-103 (Merity et al., 2017) and Res Net-50 trained on Image Net (Deng et al., 2009).
Dataset Splits	No	The paper mentions 'Test Accuracy' and 'Validation Log Perplexity' but does not explicitly state the dataset split percentages or specific methodology used for creating train/validation/test splits.
Hardware Specification	Yes	We ran our experiments on a cluster of 34 GPUs, where 24 are RTX 2080 GPUs and 10 are A5000 GPUs. Each experiment on CIFAR-10 required a single RTX 2080 GPU, each experiment on Image Net required a single A5000 GPU, each pretraining experiment on GPT required a set of 4 RTX 2080 GPUs, each pretraining experiment on Ro BERTa required a set of 8 RTX 2080 GPUs, and each ﬁnetuning experiment on Ro BERTa required a single RTX 2080 GPU.
Software Dependencies	No	The paper mentions using the code of Wettig et al. (2022) but does not specify the versions of software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow) or Python.
Experiment Setup	No	Appendix J contains the training details of all the experiments.