Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Authors: Sadhika Malladi, Kaifeng Lyu, Abhishek Panigrahi, Sanjeev Arora
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A key practical result is the derivation of a square root scaling rule to adjust the optimization hyperparameters of RMSprop and Adam when changing batch size, and its empirical validation in deep learning settings. |
| Researcher Affiliation | Academia | Sadhika Malladi Kaifeng Lyu Abhishek Panigrahi Sanjeev Arora Department of Computer Science Princeton University EMAIL |
| Pseudocode | No | The paper describes algorithms like RMSprop, Adam, and SVAG, but does not present them in a formalized 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | We include the code for the vision experiments in the supplementary material. For the NLP experiments, we use the code of Wettig et al. (2022). |
| Open Datasets | Yes | Figures 1 and 2 show the square root scaling rule applied to Res Net-50 (He et al., 2016) and VGG-16 (Simonyan and Zisserman, 2014) trained on CIFAR-10 (Krizhevsky et al.), Ro BERTa-large (Liu et al., 2019) trained on the Wiki+Books corpus (Zhu et al., 2015), 12-layer GPT (Brown et al., 2020) on Wiki Text-103 (Merity et al., 2017) and Res Net-50 trained on Image Net (Deng et al., 2009). |
| Dataset Splits | No | The paper mentions 'Test Accuracy' and 'Validation Log Perplexity' but does not explicitly state the dataset split percentages or specific methodology used for creating train/validation/test splits. |
| Hardware Specification | Yes | We ran our experiments on a cluster of 34 GPUs, where 24 are RTX 2080 GPUs and 10 are A5000 GPUs. Each experiment on CIFAR-10 required a single RTX 2080 GPU, each experiment on Image Net required a single A5000 GPU, each pretraining experiment on GPT required a set of 4 RTX 2080 GPUs, each pretraining experiment on Ro BERTa required a set of 8 RTX 2080 GPUs, and each ๏ฌnetuning experiment on Ro BERTa required a single RTX 2080 GPU. |
| Software Dependencies | No | The paper mentions using the code of Wettig et al. (2022) but does not specify the versions of software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow) or Python. |
| Experiment Setup | No | Appendix J contains the training details of all the experiments. |