GradMax: Growing Neural Networks using Gradient Information

Authors: Utku Evci, Bart van Merrienboer, Thomas Unterthiner, Fabian Pedregosa, Max Vladymyrov

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS We evaluate gradient maximizing growing (Grad Max) using small multilayer perceptrons (MLPs) and popular deep convolutional architectures. In Section 4.1 we focus on verifying the effectiveness of Grad Max and in Section 4.2 we evaluate Grad Max on common image classification benchmarks and show the effect of different hyper-parameter choices.
Researcher Affiliation Industry Utku Evci, Bart van Merri enboer, Thomas Unterthiner, Max Vladymyrov, Fabian Pedregosa Google Research, Brain Team {evcu,bartvm,unterthiner,mxv,pedregosa}@google.com
Pseudocode No The paper describes its method using equations and detailed textual explanations, but it does not include a dedicated section or figure labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes 1We open source our code at https://github.com/google-research/growneuron.
Open Datasets Yes We benchmark Grad Max using various growing schedules and architectures on CIFAR-10, CIFAR-100, and Image Net.
Dataset Splits Yes We benchmark Grad Max using various growing schedules and architectures on CIFAR-10, CIFAR-100, and Image Net.
Hardware Specification No The paper does not provide specific details on the hardware used for running experiments, such as GPU or CPU models. It only generally refers to 'compute'.
Software Dependencies No We implement Grad Max using Tensorflow (Abadi et al., 2015). The paper cites the TensorFlow framework but does not provide a specific version number or other software dependencies with their versions.
Experiment Setup Yes We train the networks for 200 epochs using SGD with a momentum of 0.9 and decay the learning rate using a cosine schedule. In both experiments we use a batch size of 128 and perform a small learning rate sweep to find learning rates that give best test accuracy for baseline training. We find 0.1 for Wide-Res Net and 0.05 for VGG to perform best and use the same learning rates for all different methods.