reproducibilityindex.ai

Adapting the Linearised Laplace Model Evidence for Modern Deep Learning

Authors: Javier Antoran, David Janz, James U Allingham, Erik Daxberger, Riccardo Rb Barbano, Eric Nalisnick, Jose Miguel Hernandez-Lobato

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide theoretical support for our recommendations and validate them empirically on MLPs, classic CNNs, residual networks with and without normalisation layers, generative autoencoders and transformers. 5. Experiments We proceed to provide empirical evidence for our assumptions and recommendations.
Researcher Affiliation	Academia	1University of Cambridge 2University of Alberta 3Max Planck Institute for Intelligent Systems, T ubingen 4University College London 5University of Amsterdam. Correspondence to: Javier Antor an <ja666@cam.ac.uk>.
Pseudocode	Yes	Algorithm 1: Efficient evaluation of the likelihood gradient for the linearised model
Open Source Code	No	The paper states 'We use the recently-released laplace library2...'. This indicates the use of an external library, not the release of the authors' specific implementation for the methods described in this paper.
Open Datasets	Yes	We provide theoretical support for our recommendations and validate them empirically on MLPs, classic CNNs, residual networks with and without normalisation layers, generative autoencoders and transformers. The paper mentions using well-known public datasets such as 'MNIST', 'KMNIST' (with citation: Clanuwat et al., 2018), and 'CIFAR10'.
Dataset Splits	No	The paper mentions 'val-based early stopping' and implies training, validation, and test sets are used, but it does not provide specific percentages or counts for these splits. For example, it does not state '80/10/10 split' or similar details for any dataset.
Hardware Specification	Yes	This choice avoids confounding the effects described in Section 3 with any further approximations. In Section 5.3, we show that our recommendations yield performance improvements on the 23M parameter Res Net-50 network while employing the standard KFAC approximation to the Hessian (Martens & Grosse, 2015; Daxberger et al., 2021a). This choice avoids confounding the effects described in Section 3 with any further approximations. In Section 5.3, we show that our recommendations yield performance improvements on the 23M parameter Res Net-50 network while employing the standard KFAC approximation to the Hessian (Martens & Grosse, 2015; Daxberger et al., 2021a). This choice avoids confounding the effects described in Section 3 with any further approximations. In Section 5.3, we show that our recommendations yield performance improvements on the 23M parameter Res Net-50 network while employing the standard KFAC approximation to the Hessian (Martens & Grosse, 2015; Daxberger et al., 2021a). This choice avoids confounding the effects described in Section 3 with any further approximations. In Section 5.3, we show that our recommendations yield performance improvements on the 23M parameter Res Net-50 network while employing the standard KFAC approximation to the Hessian (Martens & Grosse, 2015; Daxberger et al., 2021a). This choice avoids confounding the effects described in Section 3 with any further approximations. In Section 5.3, we show that our recommendations yield performance improvements on the 23M parameter Res Net-50 network while employing the standard KFAC approximation to the Hessian (Martens & Grosse, 2015; Daxberger et al., 2021a). This choice avoids confounding the effects described in Section 3 with any further approximations. In Section 5.3, we show that our recommendations yield performance improvements on the 23M parameter Res Net-50 network while employing the standard KFAC approximation to the Hessian (Martens & Grosse, 2015; Daxberger et al., 2021a). the largest model for which we can tractably compute the Hessian on an A100 GPU.
Software Dependencies	No	The paper states 'We use the recently-released laplace library2' and provides a URL in a footnote, but it does not specify a version number for this library or any other software component used in the experiments.
Experiment Setup	Yes	Unless specified otherwise, NN weights θ are learnt using SGD, with an initial learning rate of 0.1, momentum of 0.9, and weight decay of 1 10 4. We trained for 90 epochs, using a multi-step LR scheduler with a decay rate of 0.1 applied at epochs 40 and 70.