Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Finite Versus Infinite Neural Networks: an Empirical Study

Authors: Jaehoon Lee, Samuel Schoenholz, Jeffrey Pennington, Ben Adlam, Lechao Xiao, Roman Novak, Jascha Sohl-Dickstein

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods.
Researcher Affiliation Industry Google Brain EMAIL
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper states that "All experiments use the Neural Tangents library [15]", and provides a URL for this library: "https://github.com/google/neural-tangents". However, this is a third-party library used by the authors, not the specific source code for the methodology or implementation described in this paper.
Open Datasets Yes we evaluated every intervention for every architecture and focused on a single dataset, CIFAR-10 [70]. However, to ensure robustness of our results across dataset, we evaluate several key claims on CIFAR-100 and Fashion-MNIST [71].
Dataset Splits No Figures 3 and 7 show "Validation MSE" and "Validation Accuracy" plots, indicating that a validation set was used. However, the paper does not explicitly provide specific split percentages, sample counts, or citations to predefined validation splits.
Hardware Specification No Typically this takes around 1200 GPU hours with double precision. This indicates the type of hardware (GPU) and usage, but does not provide specific model numbers or detailed specifications.
Software Dependencies No All experiments use the Neural Tangents library [15], built on top of JAX [69]. We acknowledge the Python community [127] for developing the core set of tools that enabled this work, including Num Py [128], Sci Py [129], Matplotlib [130], Pandas [131], Jupyter [132], JAX [133], Neural Tangents [15], Apache Beam [68], Tensorflow datasets [134] and Google Colaboratory [135]. While software is listed, no specific version numbers are provided for any of the dependencies.
Experiment Setup Yes We use MSE loss... In all cases we use Re LU nonlinearities with critical initialization with small bias variance (σ2 w = 2.0, σ2 b = 0.01). Except if otherwise stated, we consider FCNs with 3-layers of width 2048 and CNNs with 8-layers of 512 channels per layer. In the finite-width settings, the base case uses mini-batch gradient descent at a constant small learning rate.