Randomized Automatic Differentiation
Authors: Deniz Oktay, Nick McGreivy, Joshua Aduol, Alex Beatson, Ryan P Adams
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We develop RAD techniques for a variety of simple neural network architectures, and show that for a fixed memory budget, RAD converges in fewer iterations than using a small batch size for feedforward networks, and in a similar number for recurrent networks. We also show that RAD can be applied to scientific computing, and use it to develop a low-memory stochastic gradient method for optimizing the control parameters of a linear reaction-diffusion PDE representing a fission reactor. |
| Researcher Affiliation | Academia | Princeton University Princeton, NJ {doktay,mcgreivy,jaduol,abeatson,rpa}@princeton.edu |
| Pseudocode | Yes | Algorithm 1 RMAD with path sampling |
| Open Source Code | Yes | The code is provided on Git Hub1. 1https://github.com/PrincetonLIPS/Randomized Automatic Differentiation |
| Open Datasets | Yes | We evaluate our proposed RAD method on two feedforward architectures: a small fully connected network trained on MNIST, and a small convolutional network trained on CIFAR-10. We also evaluate our method on an RNN trained on Sequential-MNIST. |
| Dataset Splits | Yes | We then randomly hold out a validation dataset of size 5000 from the CIFAR-10 and MNIST training sets and train each pair on the reduced training dataset and evaluate on the validation set. |
| Hardware Specification | Yes | All experiments were run on a single NVIDIA K80 or V100 GPU. |
| Software Dependencies | No | The paper mentions |
| Experiment Setup | Yes | Our feedforward network full-memory baseline is trained with a minibatch size of 150... We train with the Adam optimizer... We tune the initial learning rate and ℓ2 weight decay parameters... The learning rate was fixed at 10 4 for all gradient estimators... All recurrent models are trained with SGD without momentum. |