Continual Deep Learning by Functional Regularisation of Memorable Past
Authors: Pingbo Pan, Siddharth Swaroop, Alexander Immer, Runa Eschenhagen, Richard Turner, Mohammad Emtiyaz Khan
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method achieves state-of-the-art performance on standard benchmarks and opens a new direction for life-long learning where regularisation and memorybased methods are naturally combined. and 4 Experiments To identify the benefits of the functional prior (step A) and memorable past (step B), we compare FROMP to three variants: (1) FROMP-L2 where we replace the kernel in Eq. 5 by the identity matrix, similar to Eq. 1, (2) FRORP where memorable examples selected randomly ( R stands for random), (3) FRORP-L2 which is same as FRORP, but the kernel in Eq. 5 is replaced by the identity matrix. We present comparisons on four benchmarks: a toy dataset, permuted MNIST, Split MNIST, and Split CIFAR (a split version of CIFAR-10 & CIFAR-100). Results for the toy dataset are summarised in Fig. 5 and App. G, where we also visually show the brittleness of weight-space methods. In all experiments, we use the Adam optimiser [17]. Details on hyperparameter settings are in App. F. |
| Researcher Affiliation | Academia | Pingbo Pan,1, , Siddharth Swaroop,2, , Alexander Immer,3, Runa Eschenhagen,4, Richard E. Turner,2 Mohammad Emtiyaz Khan5, . 1 University of Technology Sydney, Australia 2 University of Cambridge, Cambridge, UK 3 École Polytechnique Fédérale de Lausanne, Switzerland 4 University of Tübingen, Tübingen, Germany 5 RIKEN Center for AI Project, Tokyo, Japan |
| Pseudocode | Yes | Algorithm 1: FROMP for binary classification on task t given qt 1(w) := N(µt 1, diag(vt 1)), and memorable pasts M1:t 1. Additional computations on top of Adam are highlighted in red. Function FROMP(Dt, µt 1, vt 1, M1:t 1): ... |
| Open Source Code | Yes | 1Code for all experiments is available at https://github.com/team-approx-bayes/fromp. |
| Open Datasets | Yes | We present comparisons on four benchmarks: a toy dataset, permuted MNIST, Split MNIST, and Split CIFAR (a split version of CIFAR-10 & CIFAR-100). |
| Dataset Splits | No | The paper uses standard benchmarks like Permuted MNIST, Split MNIST, and Split CIFAR. While these datasets have conventional splits, the paper does not explicitly state the specific training, validation, and test split percentages or sample counts used for its experiments. |
| Hardware Specification | No | We are also thankful for the RAIDEN computing system and its support team at the RIKEN Center for AI Project, which we used extensively for our experiments. (Explanation: The paper mentions the computing system used but does not provide specific details about GPU models, CPU types, or memory.) |
| Software Dependencies | No | In all experiments, we use the Adam optimiser [17]. (Explanation: The paper mentions using the Adam optimizer but does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks.) |
| Experiment Setup | Yes | Details on hyperparameter settings are in App. F. (Appendix F provides specific hyperparameter values for learning rate, batch size, epochs, and Adam epsilon.) |