GradMax: Growing Neural Networks using Gradient Information
Authors: Utku Evci, Bart van Merrienboer, Thomas Unterthiner, Fabian Pedregosa, Max Vladymyrov
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS We evaluate gradient maximizing growing (Grad Max) using small multilayer perceptrons (MLPs) and popular deep convolutional architectures. In Section 4.1 we focus on verifying the effectiveness of Grad Max and in Section 4.2 we evaluate Grad Max on common image classification benchmarks and show the effect of different hyper-parameter choices. |
| Researcher Affiliation | Industry | Utku Evci, Bart van Merri enboer, Thomas Unterthiner, Max Vladymyrov, Fabian Pedregosa Google Research, Brain Team {evcu,bartvm,unterthiner,mxv,pedregosa}@google.com |
| Pseudocode | No | The paper describes its method using equations and detailed textual explanations, but it does not include a dedicated section or figure labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | 1We open source our code at https://github.com/google-research/growneuron. |
| Open Datasets | Yes | We benchmark Grad Max using various growing schedules and architectures on CIFAR-10, CIFAR-100, and Image Net. |
| Dataset Splits | Yes | We benchmark Grad Max using various growing schedules and architectures on CIFAR-10, CIFAR-100, and Image Net. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for running experiments, such as GPU or CPU models. It only generally refers to 'compute'. |
| Software Dependencies | No | We implement Grad Max using Tensorflow (Abadi et al., 2015). The paper cites the TensorFlow framework but does not provide a specific version number or other software dependencies with their versions. |
| Experiment Setup | Yes | We train the networks for 200 epochs using SGD with a momentum of 0.9 and decay the learning rate using a cosine schedule. In both experiments we use a batch size of 128 and perform a small learning rate sweep to find learning rates that give best test accuracy for baseline training. We find 0.1 for Wide-Res Net and 0.05 for VGG to perform best and use the same learning rates for all different methods. |