Efficient Computation of Deep Nonlinear Infinite-Width Neural Networks that Learn Features

Authors: Greg Yang, Michael Santacroce, Edward J Hu

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate it on CIFAR10 and Omniglot against NTK as well as finite networks, finding the π-limit outperform finite-width models trained normally (without projection) in both settings, closing the performance gap between finiteand infinite-width neural networks previously left by NTK.
Researcher Affiliation Industry Greg Yang Microsoft Michael Santacroce Microsoft Edward J. Hu Microsoft
Pseudocode No The paper includes mathematical theorems and descriptions of processes, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes Code for this work is available at github.com/santacml/pilim.
Open Datasets Yes Here we compare the performance of the relu π-limit on CIFAR10 (Krizhevsky, 2009) and Omniglot (Lake et al., 2015)...
Dataset Splits Yes In each epoch, we validate on 500 batches from the validation set.
Hardware Specification Yes All of our experiments are done on V100 GPUs.
Software Dependencies No The paper mentions 'relu activation' and 'half precision' but does not specify software dependencies like libraries or frameworks with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x).
Experiment Setup Yes We adopt a step learning rate schedule, with a learning rate drop of 0.15 at a certain milestone, which is a hyperparameter. We sweep over a variety of hyperparameters such as the learning rate, gradient clipping, weight decay, the LR drop milestone, etc, as well as width, r, and depth.