Natural continual learning: success is a journey, not (just) a destination
Authors: Ta-Chu Kao, Kristopher Jensen, Gido van de Ven, Alberto Bernacchia, Guillaume Hennequin
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method outperforms both standard weight regularization techniques and projection based approaches when applied to continual learning problems in feedforward and recurrent networks. We show that NCL outperforms previous continual learning algorithms in both feedforward and recurrent networks. |
| Researcher Affiliation | Collaboration | Ta-Chu Kao1* Kristopher T. Jensen1* Gido M. van de Ven1,2 Alberto Bernacchia3 Guillaume Hennequin1 1. Department of Engineering, University of Cambridge 2. Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine 3. Media Tek Research, Cambridge |
| Pseudocode | Yes | The NCL algorithm is described in pseudocode in Appendix E together with additional implementation and computational details. |
| Open Source Code | Yes | Our code is available online1. 1https://github.com//tachukao/ncl |
| Open Datasets | Yes | To verify the utility of NCL for continual learning, we first compared our algorithm to standard methods in feedforward networks across two continual learning benchmarks: split MNIST and split CIFAR-100 (see Appendix B for task details). We thus considered an augmented version of the stroke MNIST dataset [SMNIST; 9]. |
| Dataset Splits | No | For the split MNIST and split CIFAR-100 experiments, each baseline method had a single hyperparameter (c for SI, λ for EWC and KFAC, α for OWM, and pw for NCL; Appendix E) that was optimized on a held-out seed (see Appendix I.2). The paper mentions using 'a held-out seed' for hyperparameter optimization, which implies a validation set, but does not provide specific details on the split percentages or counts. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions using Adam for optimization but does not provide specific version numbers for any software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | For the split MNIST and split CIFAR-100 experiments, each baseline method had a single hyperparameter (c for SI, λ for EWC and KFAC, α for OWM, and pw for NCL; Appendix E) that was optimized on a held-out seed (see Appendix I.2). However, for our experiments in RNNs, we instead fix pw = 1 and perform a hyperparameter optimization over α for a more direct comparison with OWM and DOWM. |