Efficient Computation of Deep Nonlinear Infinite-Width Neural Networks that Learn Features
Authors: Greg Yang, Michael Santacroce, Edward J Hu
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate it on CIFAR10 and Omniglot against NTK as well as finite networks, finding the π-limit outperform finite-width models trained normally (without projection) in both settings, closing the performance gap between finiteand infinite-width neural networks previously left by NTK. |
| Researcher Affiliation | Industry | Greg Yang Microsoft Michael Santacroce Microsoft Edward J. Hu Microsoft |
| Pseudocode | No | The paper includes mathematical theorems and descriptions of processes, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Code for this work is available at github.com/santacml/pilim. |
| Open Datasets | Yes | Here we compare the performance of the relu π-limit on CIFAR10 (Krizhevsky, 2009) and Omniglot (Lake et al., 2015)... |
| Dataset Splits | Yes | In each epoch, we validate on 500 batches from the validation set. |
| Hardware Specification | Yes | All of our experiments are done on V100 GPUs. |
| Software Dependencies | No | The paper mentions 'relu activation' and 'half precision' but does not specify software dependencies like libraries or frameworks with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x). |
| Experiment Setup | Yes | We adopt a step learning rate schedule, with a learning rate drop of 0.15 at a certain milestone, which is a hyperparameter. We sweep over a variety of hyperparameters such as the learning rate, gradient clipping, weight decay, the LR drop milestone, etc, as well as width, r, and depth. |