Convolutional Deep Kernel Machines

Authors: Edward Milsom, Ben Anson, Laurence Aitchison

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here, we introduce convolutional deep kernel machines. This required us to develop a novel inter-domain inducing point approximation, as well as introducing and experimentally assessing a number of techniques not previously seen in DKMs, including analogues to batch normalisation, different likelihoods, and different types of top-layer. The resulting model trains in roughly 77 GPU hours, achieving around 99% test accuracy on MNIST, 72% on CIFAR-100, and 92.7% on CIFAR-10, which is SOTA for kernel methods.
Researcher Affiliation Academia Edward Milsom School of Mathematics University of Bristol edward.milsom@bristol.ac.uk Ben Anson School of Mathematics University of Bristol ben.anson@bristol.ac.uk Laurence Aitchison Department of Computer Science University of Bristol laurence.aitchison@gmail.com
Pseudocode No The paper includes mathematical derivations and descriptions of processes but does not feature any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes The experiments in this paper may be reproduced using the code and instructions given in the following publicly available Git Hub repo: https://github.com/edwardmilsom/convdkmpaper.
Open Datasets Yes We tested on the MNIST4 (Le Cun et al., 1998), CIFAR-10, and CIFAR-1005 (Krizhevsky & Hinton, 2009) image classification datasets. 4https://yann.lecun.com/exdb/mnist/ Licence: CC BY-SA 3.0 5https://cs.toronto.edu/ kriz/cifar.html Licence: Unknown
Dataset Splits No The paper mentions training and testing and conducts 'model selection experiments' but does not explicitly provide details about train/validation/test dataset splits (e.g., percentages, sample counts, or explicit reference to standard splits used for validation). While standard datasets like MNIST/CIFAR-10/100 often have predefined splits, the paper does not state that it uses these standard splits or specify any custom splits for validation.
Hardware Specification Yes Training this model on CIFAR-10/100 took around 77 hours on a single NVIDIA A100, but we emphasise that we used double precision floating points for simplicity of implementation, and with some care taken to preserve numerical stability, this could be dramatically sped up using single precision arithmetic.
Software Dependencies No The paper mentions that the model was 'implemented2 in Py Torch (Paszke et al., 2019)', but it does not specify the version number for PyTorch or any other software dependencies, which are necessary for full reproducibility.
Experiment Setup Yes We optimised all parameters using Adam, with β1 = 0.8, β2 = 0.9, training for 100 epochs and dividing the learning rate by 10 at epochs 40 and 80, with an initial learning rate of 0.013... We used data augmentation (random cropping and horizontal flips), and a batch size of 256. We also used ZCA (zero-phase component analysis) whitening, which is commonly used in kernel methods (e.g. Shankar et al., 2020; Lee et al., 2020) (for the ZCA regularisation parameter, ϵ, we used 0.1).