reproducibilityindex.ai

Deep Neural Networks as Gaussian Processes

Authors: Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then use the resulting GPs to perform Bayesian inference for wide deep neural networks on MNIST and CIFAR10. We observe that trained neural network accuracy approaches that of the corresponding GP with increasing layer width, and that the GP uncertainty is strongly correlated with trained network prediction error. We conduct experiments making Bayesian predictions on MNIST and CIFAR-10 (Section 3) and compare against NNs trained with standard gradient-based approaches.
Researcher Affiliation	Industry	Google Brain {jaehlee, yasamanb, romann, schsam, jpennin, jaschasd}@google.com
Pseudocode	No	The paper describes computational steps but does not include structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code	Yes	An open source implementation of the algorithm is available at https://github.com/brain-research/nngp.
Open Datasets	Yes	We compare NNGPs with SGD trained neural networks on the permutation invariant MNIST and CIFAR-10 datasets.
Dataset Splits	Yes	For MNIST we use a 50k/10k/10k split of the training/validation/test dataset. For CIFAR-10, we used a 45k/5k/10k split.
Hardware Specification	No	The paper mentions "6 CPUs" and "64 CPUs" for computation time but does not provide specific CPU models, GPU models, or other detailed hardware specifications for running experiments.
Software Dependencies	No	The paper mentions tools like "Adam optimizer" and "Google Vizier hyperparameter tuner" but does not provide specific software dependencies with version numbers (e.g., library names with versions like Python 3.8, PyTorch 1.9).
Experiment Setup	Yes	Random search range: Learning rate was sampled within (10 4, 0.2) in log-scale, weight decay constant was sampled from (10 8, 1.0) in log-scale, σw [0.01, 2.5], σb [0, 1.5] was uniformly sampled and mini-batch size was chosen equally among [16, 32, 64, 128, 256]. For the GP with given depth and nonlinearity, a grid of 30 points evenly spaced from 0.1 to 5.0 (for σ2 w) and 30 points evenly spaced from 0 to 2.0 (for σ2 b) was evaluated to generate the heatmap.