A Bayesian Perspective on Training Speed and Model Selection

Authors: Clare Lyle, Lisa Schut, Robin Ru, Yarin Gal, Mark van der Wilk

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks. We further provide encouraging empirical evidence that the intuition developed in these settings also holds for deep neural networks trained with stochastic gradient descent.
Researcher Affiliation Academia OATML Group, University of Oxford. Correspondence to clare.lyle@cs.ox.ac.uk Imperial College London
Pseudocode Yes Algorithm 1: Marginal Likelihood Estimation for Linear Models
Open Source Code No No explicit statement providing access to open-source code for the methodology described in this paper.
Open Datasets Yes We construct a synthetic dataset inspired by Wilson and Izmailov [46]... Here we evaluate the relative change in the log ML of a Gaussian Process induced by a fully-connected MLP (MLP-NTK-GP) and a convolutional neural network (Conv-NTK-GP) which performs regression on the MNIST dataset... In this section, we evaluate whether this conjecture holds for a simple convolutional neural network trained on the Fashion MNIST dataset... We find the same trend holds for CIFAR-10, which is shown in Appendix B.3.
Dataset Splits No No explicit percentages, sample counts, or detailed splitting methodology (e.g., '80/10/10 split') for training, validation, and test sets are provided in the main text. Appendix B.2 mentions 'Fashion MNIST dataset' and '20 epochs' but no specific splits.
Hardware Specification Yes All models are trained using PyTorch (Paszke et al., 2019) on NVIDIA GeForce GTX TITAN X GPUs.
Software Dependencies No No specific version numbers are provided for software dependencies. The paper mentions 'PyTorch (Paszke et al., 2019)' but without a version number.
Experiment Setup Yes For all networks, we used the Adam optimizer (Kingma and Ba, 2014) with a batch size of 128 and a learning rate of 1e-4. The models were trained for 20 epochs.