Continual learning in recurrent neural networks

Authors: Benjamin Ehret, Christian Henning, Maria Cervera, Alexander Meulemans, Johannes von Oswald, Benjamin F Grewe

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here, we provide the first comprehensive evaluation of established CL methods on a variety of sequential data benchmarks. Specifically, we shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs. Overall, we provide insights on the differences between CL in feedforward networks and RNNs, while guiding towards effective solutions to tackle CL on sequential data. To test whether the results from the synthetic Copy Task hold true for real world data we turned to a sequential digit recognition task where task difficulty can be directly controlled. We distinguish between during and final accuracies.
Researcher Affiliation Academia Institute of Neuroinformatics University of Zürich and ETH Zürich Zürich, Switzerland
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Source code for all experiments (including all baselines) is available at https://github.com/ mariacer/cl_in_rnns.
Open Datasets Yes We provide a code base comprising all assessed methods as well as variants of four well known sequential datasets adapted to CL: the Copy Task (Graves et al., 2014), Sequential Stroke MNIST (Gulcehre et al., 2017), Audio Set (Gemmeke et al., 2017) and multilingual Part-of-Speech tagging (Nivre et al., 2016).
Dataset Splits No The paper does not explicitly state training, validation, or test dataset splits in terms of percentages or sample counts in the main text.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for the experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies (e.g., libraries, frameworks).
Experiment Setup No While mentioning 'hyperparameter search', the paper does not provide specific hyperparameter values or detailed system-level training configurations in the main text, deferring them to supplementary materials. For example: 'For all reported methods, results were obtained via an extensive hyperparameter search, where the hyperparameter configuration of the run with best final accuracy was selected and subsequently tested on multiple random seeds (experimental details in SM F).'