Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Randomized Automatic Differentiation

Authors: Deniz Oktay, Nick McGreivy, Joshua Aduol, Alex Beatson, Ryan P Adams

ICLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We develop RAD techniques for a variety of simple neural network architectures, and show that for a fixed memory budget, RAD converges in fewer iterations than using a small batch size for feedforward networks, and in a similar number for recurrent networks. We also show that RAD can be applied to scientific computing, and use it to develop a low-memory stochastic gradient method for optimizing the control parameters of a linear reaction-diffusion PDE representing a fission reactor.
Researcher Affiliation Academia Princeton University Princeton, NJ EMAIL
Pseudocode Yes Algorithm 1 RMAD with path sampling
Open Source Code Yes The code is provided on Git Hub1. 1https://github.com/PrincetonLIPS/Randomized Automatic Differentiation
Open Datasets Yes We evaluate our proposed RAD method on two feedforward architectures: a small fully connected network trained on MNIST, and a small convolutional network trained on CIFAR-10. We also evaluate our method on an RNN trained on Sequential-MNIST.
Dataset Splits Yes We then randomly hold out a validation dataset of size 5000 from the CIFAR-10 and MNIST training sets and train each pair on the reduced training dataset and evaluate on the validation set.
Hardware Specification Yes All experiments were run on a single NVIDIA K80 or V100 GPU.
Software Dependencies No The paper mentions
Experiment Setup Yes Our feedforward network full-memory baseline is trained with a minibatch size of 150... We train with the Adam optimizer... We tune the initial learning rate and ℓ2 weight decay parameters... The learning rate was fixed at 10 4 for all gradient estimators... All recurrent models are trained with SGD without momentum.