Continual Deep Learning by Functional Regularisation of Memorable Past

Authors: Pingbo Pan, Siddharth Swaroop, Alexander Immer, Runa Eschenhagen, Richard Turner, Mohammad Emtiyaz Khan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method achieves state-of-the-art performance on standard benchmarks and opens a new direction for life-long learning where regularisation and memorybased methods are naturally combined. and 4 Experiments To identify the benefits of the functional prior (step A) and memorable past (step B), we compare FROMP to three variants: (1) FROMP-L2 where we replace the kernel in Eq. 5 by the identity matrix, similar to Eq. 1, (2) FRORP where memorable examples selected randomly ( R stands for random), (3) FRORP-L2 which is same as FRORP, but the kernel in Eq. 5 is replaced by the identity matrix. We present comparisons on four benchmarks: a toy dataset, permuted MNIST, Split MNIST, and Split CIFAR (a split version of CIFAR-10 & CIFAR-100). Results for the toy dataset are summarised in Fig. 5 and App. G, where we also visually show the brittleness of weight-space methods. In all experiments, we use the Adam optimiser [17]. Details on hyperparameter settings are in App. F.
Researcher Affiliation Academia Pingbo Pan,1, , Siddharth Swaroop,2, , Alexander Immer,3, Runa Eschenhagen,4, Richard E. Turner,2 Mohammad Emtiyaz Khan5, . 1 University of Technology Sydney, Australia 2 University of Cambridge, Cambridge, UK 3 École Polytechnique Fédérale de Lausanne, Switzerland 4 University of Tübingen, Tübingen, Germany 5 RIKEN Center for AI Project, Tokyo, Japan
Pseudocode Yes Algorithm 1: FROMP for binary classification on task t given qt 1(w) := N(µt 1, diag(vt 1)), and memorable pasts M1:t 1. Additional computations on top of Adam are highlighted in red. Function FROMP(Dt, µt 1, vt 1, M1:t 1): ...
Open Source Code Yes 1Code for all experiments is available at https://github.com/team-approx-bayes/fromp.
Open Datasets Yes We present comparisons on four benchmarks: a toy dataset, permuted MNIST, Split MNIST, and Split CIFAR (a split version of CIFAR-10 & CIFAR-100).
Dataset Splits No The paper uses standard benchmarks like Permuted MNIST, Split MNIST, and Split CIFAR. While these datasets have conventional splits, the paper does not explicitly state the specific training, validation, and test split percentages or sample counts used for its experiments.
Hardware Specification No We are also thankful for the RAIDEN computing system and its support team at the RIKEN Center for AI Project, which we used extensively for our experiments. (Explanation: The paper mentions the computing system used but does not provide specific details about GPU models, CPU types, or memory.)
Software Dependencies No In all experiments, we use the Adam optimiser [17]. (Explanation: The paper mentions using the Adam optimizer but does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks.)
Experiment Setup Yes Details on hyperparameter settings are in App. F. (Appendix F provides specific hyperparameter values for learning rate, batch size, epochs, and Adam epsilon.)