A Bayesian Perspective on Training Speed and Model Selection
Authors: Clare Lyle, Lisa Schut, Robin Ru, Yarin Gal, Mark van der Wilk
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks. We further provide encouraging empirical evidence that the intuition developed in these settings also holds for deep neural networks trained with stochastic gradient descent. |
| Researcher Affiliation | Academia | OATML Group, University of Oxford. Correspondence to clare.lyle@cs.ox.ac.uk Imperial College London |
| Pseudocode | Yes | Algorithm 1: Marginal Likelihood Estimation for Linear Models |
| Open Source Code | No | No explicit statement providing access to open-source code for the methodology described in this paper. |
| Open Datasets | Yes | We construct a synthetic dataset inspired by Wilson and Izmailov [46]... Here we evaluate the relative change in the log ML of a Gaussian Process induced by a fully-connected MLP (MLP-NTK-GP) and a convolutional neural network (Conv-NTK-GP) which performs regression on the MNIST dataset... In this section, we evaluate whether this conjecture holds for a simple convolutional neural network trained on the Fashion MNIST dataset... We find the same trend holds for CIFAR-10, which is shown in Appendix B.3. |
| Dataset Splits | No | No explicit percentages, sample counts, or detailed splitting methodology (e.g., '80/10/10 split') for training, validation, and test sets are provided in the main text. Appendix B.2 mentions 'Fashion MNIST dataset' and '20 epochs' but no specific splits. |
| Hardware Specification | Yes | All models are trained using PyTorch (Paszke et al., 2019) on NVIDIA GeForce GTX TITAN X GPUs. |
| Software Dependencies | No | No specific version numbers are provided for software dependencies. The paper mentions 'PyTorch (Paszke et al., 2019)' but without a version number. |
| Experiment Setup | Yes | For all networks, we used the Adam optimizer (Kingma and Ba, 2014) with a batch size of 128 and a learning rate of 1e-4. The models were trained for 20 epochs. |