Deep Convolutional Networks as shallow Gaussian Processes

Authors: Adrià Garriga-Alonso, Carl Edward Rasmussen, Laurence Aitchison

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we empirically demonstrate the performance increase coming from adding translation-invariant structure to the GP prior. Without computing any gradients, and without augmenting the training set (e.g. using translations), we obtain 0.84% error rate on the MNIST classification benchmark, setting a new record for nonparametric GP-based methods.
Researcher Affiliation Academia Adrià Garriga-Alonso department of Engineering University of Cambridge ag919@cam.ac.uk Carl Edward Rasmussen Department of Engineering University of Cambridge cer54@cam.ac.uk Laurence Aitchison Department of Engineering University of Cambridge laurence.aitchison@gmail.com
Pseudocode Yes Algorithm 1 The Conv Net kernel k(X, X )
Open Source Code Yes 1Code to replicate this paper is available at https://github.com/convnets-as-gps/convnets-as-gps
Open Datasets Yes We evaluate our kernel on the MNIST handwritten digit classification task.
Dataset Splits Yes The training set is split into N = 50000 training and 10000 validation examples.
Hardware Specification Yes For the Res Net kernel, the most expensive, computing Kxx, and Kxx for validation and test took 3h 40min on two Tesla P100 GPUs. In contrast, inverting Kxx and computing validation and test performance took 43.25 8.8 seconds on a single Tesla P100 GPU.
Software Dependencies No The paper mentions using "GPflow Matthews et al. (2017)" but does not specify version numbers for this or any other software dependencies.
Experiment Setup Yes For the Conv Net GP and Residual CNN GP , (Table 1) we optimise the kernel hyperparameters by random search. We draw M random hyperparameter samples, compute the resulting kernel s performance in the validation set, and pick the highest performing run. The kernel hyperparameters are: σ2 b, σ2 w; the number of layers; the convolution stride, filter sizes and edge behaviour; the nonlinearity (we consider the error function and Re LU); and the frequency of residual skip connections (for Residual CNN GPs).