Randomized Automatic Differentiation

Authors: Deniz Oktay, Nick McGreivy, Joshua Aduol, Alex Beatson, Ryan P Adams

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We develop RAD techniques for a variety of simple neural network architectures, and show that for a fixed memory budget, RAD converges in fewer iterations than using a small batch size for feedforward networks, and in a similar number for recurrent networks. We also show that RAD can be applied to scientific computing, and use it to develop a low-memory stochastic gradient method for optimizing the control parameters of a linear reaction-diffusion PDE representing a fission reactor.
Researcher Affiliation Academia Princeton University Princeton, NJ {doktay,mcgreivy,jaduol,abeatson,rpa}@princeton.edu
Pseudocode Yes Algorithm 1 RMAD with path sampling
Open Source Code Yes The code is provided on Git Hub1. 1https://github.com/PrincetonLIPS/Randomized Automatic Differentiation
Open Datasets Yes We evaluate our proposed RAD method on two feedforward architectures: a small fully connected network trained on MNIST, and a small convolutional network trained on CIFAR-10. We also evaluate our method on an RNN trained on Sequential-MNIST.
Dataset Splits Yes We then randomly hold out a validation dataset of size 5000 from the CIFAR-10 and MNIST training sets and train each pair on the reduced training dataset and evaluate on the validation set.
Hardware Specification Yes All experiments were run on a single NVIDIA K80 or V100 GPU.
Software Dependencies No The paper mentions
Experiment Setup Yes Our feedforward network full-memory baseline is trained with a minibatch size of 150... We train with the Adam optimizer... We tune the initial learning rate and ℓ2 weight decay parameters... The learning rate was fixed at 10 4 for all gradient estimators... All recurrent models are trained with SGD without momentum.