reproducibilityindex.ai

A Bayesian Perspective on Training Speed and Model Selection

Authors: Clare Lyle, Lisa Schut, Robin Ru, Yarin Gal, Mark van der Wilk

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify our results in model selection tasks for linear models and for the inﬁnite-width limit of deep neural networks. We further provide encouraging empirical evidence that the intuition developed in these settings also holds for deep neural networks trained with stochastic gradient descent.
Researcher Affiliation	Academia	OATML Group, University of Oxford. Correspondence to clare.lyle@cs.ox.ac.uk Imperial College London
Pseudocode	Yes	Algorithm 1: Marginal Likelihood Estimation for Linear Models
Open Source Code	No	No explicit statement providing access to open-source code for the methodology described in this paper.
Open Datasets	Yes	We construct a synthetic dataset inspired by Wilson and Izmailov [46]... Here we evaluate the relative change in the log ML of a Gaussian Process induced by a fully-connected MLP (MLP-NTK-GP) and a convolutional neural network (Conv-NTK-GP) which performs regression on the MNIST dataset... In this section, we evaluate whether this conjecture holds for a simple convolutional neural network trained on the Fashion MNIST dataset... We find the same trend holds for CIFAR-10, which is shown in Appendix B.3.
Dataset Splits	No	No explicit percentages, sample counts, or detailed splitting methodology (e.g., '80/10/10 split') for training, validation, and test sets are provided in the main text. Appendix B.2 mentions 'Fashion MNIST dataset' and '20 epochs' but no specific splits.
Hardware Specification	Yes	All models are trained using PyTorch (Paszke et al., 2019) on NVIDIA GeForce GTX TITAN X GPUs.
Software Dependencies	No	No specific version numbers are provided for software dependencies. The paper mentions 'PyTorch (Paszke et al., 2019)' but without a version number.
Experiment Setup	Yes	For all networks, we used the Adam optimizer (Kingma and Ba, 2014) with a batch size of 128 and a learning rate of 1e-4. The models were trained for 20 epochs.