Finite Versus Infinite Neural Networks: an Empirical Study

Authors: Jaehoon Lee, Samuel Schoenholz, Jeffrey Pennington, Ben Adlam, Lechao Xiao, Roman Novak, Jascha Sohl-Dickstein

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods.
Researcher Affiliation Industry Google Brain {jaehlee, schsam, jpennin, adlam, xlc, romann, jaschasd}@google.com
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper states that "All experiments use the Neural Tangents library [15]", and provides a URL for this library: "https://github.com/google/neural-tangents". However, this is a third-party library used by the authors, not the specific source code for the methodology or implementation described in this paper.
Open Datasets Yes we evaluated every intervention for every architecture and focused on a single dataset, CIFAR-10 [70]. However, to ensure robustness of our results across dataset, we evaluate several key claims on CIFAR-100 and Fashion-MNIST [71].
Dataset Splits No Figures 3 and 7 show "Validation MSE" and "Validation Accuracy" plots, indicating that a validation set was used. However, the paper does not explicitly provide specific split percentages, sample counts, or citations to predefined validation splits.
Hardware Specification No Typically this takes around 1200 GPU hours with double precision. This indicates the type of hardware (GPU) and usage, but does not provide specific model numbers or detailed specifications.
Software Dependencies No All experiments use the Neural Tangents library [15], built on top of JAX [69]. We acknowledge the Python community [127] for developing the core set of tools that enabled this work, including Num Py [128], Sci Py [129], Matplotlib [130], Pandas [131], Jupyter [132], JAX [133], Neural Tangents [15], Apache Beam [68], Tensorflow datasets [134] and Google Colaboratory [135]. While software is listed, no specific version numbers are provided for any of the dependencies.
Experiment Setup Yes We use MSE loss... In all cases we use Re LU nonlinearities with critical initialization with small bias variance (σ2 w = 2.0, σ2 b = 0.01). Except if otherwise stated, we consider FCNs with 3-layers of width 2048 and CNNs with 8-layers of 512 channels per layer. In the finite-width settings, the base case uses mini-batch gradient descent at a constant small learning rate.