Deep Convolutional Networks as shallow Gaussian Processes
Authors: Adrià Garriga-Alonso, Carl Edward Rasmussen, Laurence Aitchison
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically demonstrate the performance increase coming from adding translation-invariant structure to the GP prior. Without computing any gradients, and without augmenting the training set (e.g. using translations), we obtain 0.84% error rate on the MNIST classification benchmark, setting a new record for nonparametric GP-based methods. |
| Researcher Affiliation | Academia | Adrià Garriga-Alonso department of Engineering University of Cambridge ag919@cam.ac.uk Carl Edward Rasmussen Department of Engineering University of Cambridge cer54@cam.ac.uk Laurence Aitchison Department of Engineering University of Cambridge laurence.aitchison@gmail.com |
| Pseudocode | Yes | Algorithm 1 The Conv Net kernel k(X, X ) |
| Open Source Code | Yes | 1Code to replicate this paper is available at https://github.com/convnets-as-gps/convnets-as-gps |
| Open Datasets | Yes | We evaluate our kernel on the MNIST handwritten digit classification task. |
| Dataset Splits | Yes | The training set is split into N = 50000 training and 10000 validation examples. |
| Hardware Specification | Yes | For the Res Net kernel, the most expensive, computing Kxx, and Kxx for validation and test took 3h 40min on two Tesla P100 GPUs. In contrast, inverting Kxx and computing validation and test performance took 43.25 8.8 seconds on a single Tesla P100 GPU. |
| Software Dependencies | No | The paper mentions using "GPflow Matthews et al. (2017)" but does not specify version numbers for this or any other software dependencies. |
| Experiment Setup | Yes | For the Conv Net GP and Residual CNN GP , (Table 1) we optimise the kernel hyperparameters by random search. We draw M random hyperparameter samples, compute the resulting kernel s performance in the validation set, and pick the highest performing run. The kernel hyperparameters are: σ2 b, σ2 w; the number of layers; the convolution stride, filter sizes and edge behaviour; the nonlinearity (we consider the error function and Re LU); and the frequency of residual skip connections (for Residual CNN GPs). |