Finite Versus Infinite Neural Networks: an Empirical Study
Authors: Jaehoon Lee, Samuel Schoenholz, Jeffrey Pennington, Ben Adlam, Lechao Xiao, Roman Novak, Jascha Sohl-Dickstein
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods. |
| Researcher Affiliation | Industry | Google Brain {jaehlee, schsam, jpennin, adlam, xlc, romann, jaschasd}@google.com |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states that "All experiments use the Neural Tangents library [15]", and provides a URL for this library: "https://github.com/google/neural-tangents". However, this is a third-party library used by the authors, not the specific source code for the methodology or implementation described in this paper. |
| Open Datasets | Yes | we evaluated every intervention for every architecture and focused on a single dataset, CIFAR-10 [70]. However, to ensure robustness of our results across dataset, we evaluate several key claims on CIFAR-100 and Fashion-MNIST [71]. |
| Dataset Splits | No | Figures 3 and 7 show "Validation MSE" and "Validation Accuracy" plots, indicating that a validation set was used. However, the paper does not explicitly provide specific split percentages, sample counts, or citations to predefined validation splits. |
| Hardware Specification | No | Typically this takes around 1200 GPU hours with double precision. This indicates the type of hardware (GPU) and usage, but does not provide specific model numbers or detailed specifications. |
| Software Dependencies | No | All experiments use the Neural Tangents library [15], built on top of JAX [69]. We acknowledge the Python community [127] for developing the core set of tools that enabled this work, including Num Py [128], Sci Py [129], Matplotlib [130], Pandas [131], Jupyter [132], JAX [133], Neural Tangents [15], Apache Beam [68], Tensorflow datasets [134] and Google Colaboratory [135]. While software is listed, no specific version numbers are provided for any of the dependencies. |
| Experiment Setup | Yes | We use MSE loss... In all cases we use Re LU nonlinearities with critical initialization with small bias variance (σ2 w = 2.0, σ2 b = 0.01). Except if otherwise stated, we consider FCNs with 3-layers of width 2048 and CNNs with 8-layers of 512 channels per layer. In the finite-width settings, the base case uses mini-batch gradient descent at a constant small learning rate. |