Natural continual learning: success is a journey, not (just) a destination

Authors: Ta-Chu Kao, Kristopher Jensen, Gido van de Ven, Alberto Bernacchia, Guillaume Hennequin

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method outperforms both standard weight regularization techniques and projection based approaches when applied to continual learning problems in feedforward and recurrent networks. We show that NCL outperforms previous continual learning algorithms in both feedforward and recurrent networks.
Researcher Affiliation Collaboration Ta-Chu Kao1* Kristopher T. Jensen1* Gido M. van de Ven1,2 Alberto Bernacchia3 Guillaume Hennequin1 1. Department of Engineering, University of Cambridge 2. Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine 3. Media Tek Research, Cambridge
Pseudocode Yes The NCL algorithm is described in pseudocode in Appendix E together with additional implementation and computational details.
Open Source Code Yes Our code is available online1. 1https://github.com//tachukao/ncl
Open Datasets Yes To verify the utility of NCL for continual learning, we first compared our algorithm to standard methods in feedforward networks across two continual learning benchmarks: split MNIST and split CIFAR-100 (see Appendix B for task details). We thus considered an augmented version of the stroke MNIST dataset [SMNIST; 9].
Dataset Splits No For the split MNIST and split CIFAR-100 experiments, each baseline method had a single hyperparameter (c for SI, λ for EWC and KFAC, α for OWM, and pw for NCL; Appendix E) that was optimized on a held-out seed (see Appendix I.2). The paper mentions using 'a held-out seed' for hyperparameter optimization, which implies a validation set, but does not provide specific details on the split percentages or counts.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies No The paper mentions using Adam for optimization but does not provide specific version numbers for any software dependencies like programming languages or libraries.
Experiment Setup Yes For the split MNIST and split CIFAR-100 experiments, each baseline method had a single hyperparameter (c for SI, λ for EWC and KFAC, α for OWM, and pw for NCL; Appendix E) that was optimized on a held-out seed (see Appendix I.2). However, for our experiments in RNNs, we instead fix pw = 1 and perform a hyperparameter optimization over α for a more direct comparison with OWM and DOWM.