Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations

Authors: Alexander Immer, Tycho van der Ouderaa, Gunnar Rätsch, Vincent Fortuin, Mark van der Wilk

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate experimentally that our method can differentiably learn useful distributions over affine invariances, which are common data augmentations, on various versions of the image classification datasets MNIST, Fashion MNIST, and CIFAR-10, without validation data. and 5 Experiments We evaluate our method that learns invariances using Laplace approximations (LILA) by optimising affine invariances on different MNIST (Le Cun and Cortes, 2010), Fashion MNIST (Xiao et al., 2017), and CIFAR-10 (Krizhevsky et al., 2009) classification tasks.
Researcher Affiliation Academia 1Department of Computer Science, ETH Zurich, Switzerland 2Max Planck Institute for Intelligent Systems, Tübingen, Germany 3Department of Computing, Imperial College London, UK 4Department of Engineering, University of Cambridge, UK
Pseudocode No The paper describes the approach using textual explanations and mathematical formulations, along with diagrams, but does not include a dedicated 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes The code is available at https://github.com/tychovdo/lila
Open Datasets Yes MNIST (Le Cun and Cortes, 2010), Fashion MNIST (Xiao et al., 2017), and CIFAR-10 (Krizhevsky et al., 2009) classification tasks.
Dataset Splits Yes We use the standard splits of MNIST, Fashion MNIST and CIFAR-10. and For the data efficiency experiments in Sec. 5.3 we use random subsets of the training data. For these experiments, we create 3 random subsets of sizes [1000, 2000, 5000, 10000, 20000, 30000, 40000, 50000] for CIFAR-10, [1000, 2000, 5000, 10000, 20000, 30000, 40000, 50000, 60000] for MNIST and F-MNIST.
Hardware Specification No The paper states 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No]' in Section 3d of the 'Questions for Paper Analysis'.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes We use the Adam optimizer (Kingma and Ba, 2015) with a learning rate of 10 3 for all models. We train for 500 epochs with a batch size of 256. For CIFAR-10 experiments, we additionally use an early stopping callback on the marginal likelihood that terminates training after 20 epochs of no improvement, and reduce the learning rate by a factor of 10 after 10 epochs of no improvement.