Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

Authors: Roman Novak, Lechao Xiao, Yasaman Bahri, Jaehoon Lee, Greg Yang, Jiri Hron, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-dickstein

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we derive an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and achieve state of the art results on CIFAR10 for GPs without trainable kernels. We also introduce a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible. We confirm experimentally, that while in some scenarios the performance of SGD-trained finite CNNs approaches that of the corresponding GPs as the channel count increases, with careful tuning SGD-trained CNNs can significantly outperform their corresponding GPs, suggesting advantages from SGD training compared to fully Bayesian parameter estimation.
Researcher Affiliation Collaboration Google Brain, Microsoft Research AI, Department of Engineering, University of Cambridge
Pseudocode No The paper does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to repositories for the described methodology.
Open Datasets Yes We use full training, validation, and test sets of sizes 50000, 10000, and 10000 respectively for MNIST (Le Cun et al., 1998) and Fashion-MNIST (Xiao et al., 2017a), 45000, 5000, and 10000 for CIFAR10 (Krizhevsky, 2009).
Dataset Splits Yes We use a training and validation subsets of CIFAR10 of sizes 500 and 4000 respectively. All images are bilinearly downsampled to 8 8 pixels. (...) We use full training, validation, and test sets of sizes 50000, 10000, and 10000 respectively for MNIST (Le Cun et al., 1998) and Fashion-MNIST (Xiao et al., 2017a), 45000, 5000, and 10000 for CIFAR10 (Krizhevsky, 2009).
Hardware Specification No The paper states 'All experiments were implemented in Tensorflow (Abadi et al., 2016) and executed with the help of Vizier (Golovin et al., 2017).' and mentions '32-bit precision' and '64-bit precision', but does not specify any particular CPU, GPU, or other hardware models used for computation.
Software Dependencies No The paper mentions 'Tensorflow (Abadi et al., 2016)' and 'Adam (Kingma & Ba, 2015)' as software used. However, it does not provide specific version numbers for these or any other software dependencies, which are required for reproducibility.
Experiment Setup Yes The following NN parameters are considered: (...) 3. Number of channels: 2k for k from 0 to 12. 4. Initial learning rate: 10 k for k from 0 to 15. 5. Weight decay: 0 and 10 k for k from 0 to 8. 6. Batch size: 10, 25, 50, 100, 200. (G.1)