Online Structured Laplace Approximations for Overcoming Catastrophic Forgetting

Authors: Hippolyt Ritter, Aleksandar Botev, David Barber

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our algorithm achieves over 90% test accuracy across a sequence of 50 instantiations of the permuted MNIST dataset, substantially outperforming related methods for overcoming catastrophic forgetting.
Researcher Affiliation Collaboration Hippolyt Ritter1 Aleksandar Botev1 David Barber1,2,3 1University College London 2Alan Turing Institute 3reinfer.io
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our fork with code to calculate the Kronecker factors is available at: www.github.com/BB-UCL/Lasagne
Open Datasets Yes As a first experiment, we test on a sequence of permutations of the MNIST dataset [19].
Dataset Splits No The paper mentions that 'Fig. 1 shows the mean test accuracy as new datasets are observed for the optimal hyperparameters of each method' and refers to 'the value that optimizes the validation error', confirming the use of validation. However, it does not provide specific dataset split percentages, sample counts, or citations to predefined splits for the train/validation split.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or memory amounts used for running its experiments.
Software Dependencies No The paper states: 'We implement our experiments using Theano [39] and Lasagne [3] software libraries.' However, it does not specify version numbers for these software libraries, which is required for a reproducible description.
Experiment Setup Yes For the permuted MNIST dataset, we used the Adam [15] optimizer with a learning rate of 0.001 and mini-batches of size 128. For the disjoint MNIST and vision datasets, we used Nesterov momentum [28, 32] with a learning rate of 0.1, a momentum of 0.9, and mini-batches of size 250. We trained each task for 200 epochs.