reproducibilityindex.ai

GradMax: Growing Neural Networks using Gradient Information

Authors: Utku Evci, Bart van Merrienboer, Thomas Unterthiner, Fabian Pedregosa, Max Vladymyrov

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 EXPERIMENTS We evaluate gradient maximizing growing (Grad Max) using small multilayer perceptrons (MLPs) and popular deep convolutional architectures. In Section 4.1 we focus on verifying the effectiveness of Grad Max and in Section 4.2 we evaluate Grad Max on common image classiﬁcation benchmarks and show the effect of different hyper-parameter choices.
Researcher Affiliation	Industry	Utku Evci, Bart van Merri enboer, Thomas Unterthiner, Max Vladymyrov, Fabian Pedregosa Google Research, Brain Team {evcu,bartvm,unterthiner,mxv,pedregosa}@google.com
Pseudocode	No	The paper describes its method using equations and detailed textual explanations, but it does not include a dedicated section or figure labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	1We open source our code at https://github.com/google-research/growneuron.
Open Datasets	Yes	We benchmark Grad Max using various growing schedules and architectures on CIFAR-10, CIFAR-100, and Image Net.
Dataset Splits	Yes	We benchmark Grad Max using various growing schedules and architectures on CIFAR-10, CIFAR-100, and Image Net.
Hardware Specification	No	The paper does not provide specific details on the hardware used for running experiments, such as GPU or CPU models. It only generally refers to 'compute'.
Software Dependencies	No	We implement Grad Max using Tensorﬂow (Abadi et al., 2015). The paper cites the TensorFlow framework but does not provide a specific version number or other software dependencies with their versions.
Experiment Setup	Yes	We train the networks for 200 epochs using SGD with a momentum of 0.9 and decay the learning rate using a cosine schedule. In both experiments we use a batch size of 128 and perform a small learning rate sweep to ﬁnd learning rates that give best test accuracy for baseline training. We ﬁnd 0.1 for Wide-Res Net and 0.05 for VGG to perform best and use the same learning rates for all different methods.