Continual learning in recurrent neural networks
Authors: Benjamin Ehret, Christian Henning, Maria Cervera, Alexander Meulemans, Johannes von Oswald, Benjamin F Grewe
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here, we provide the first comprehensive evaluation of established CL methods on a variety of sequential data benchmarks. Specifically, we shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs. Overall, we provide insights on the differences between CL in feedforward networks and RNNs, while guiding towards effective solutions to tackle CL on sequential data. To test whether the results from the synthetic Copy Task hold true for real world data we turned to a sequential digit recognition task where task difficulty can be directly controlled. We distinguish between during and final accuracies. |
| Researcher Affiliation | Academia | Institute of Neuroinformatics University of Zürich and ETH Zürich Zürich, Switzerland |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source code for all experiments (including all baselines) is available at https://github.com/ mariacer/cl_in_rnns. |
| Open Datasets | Yes | We provide a code base comprising all assessed methods as well as variants of four well known sequential datasets adapted to CL: the Copy Task (Graves et al., 2014), Sequential Stroke MNIST (Gulcehre et al., 2017), Audio Set (Gemmeke et al., 2017) and multilingual Part-of-Speech tagging (Nivre et al., 2016). |
| Dataset Splits | No | The paper does not explicitly state training, validation, or test dataset splits in terms of percentages or sample counts in the main text. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies (e.g., libraries, frameworks). |
| Experiment Setup | No | While mentioning 'hyperparameter search', the paper does not provide specific hyperparameter values or detailed system-level training configurations in the main text, deferring them to supplementary materials. For example: 'For all reported methods, results were obtained via an extensive hyperparameter search, where the hyperparameter configuration of the run with best final accuracy was selected and subsequently tested on multiple random seeds (experimental details in SM F).' |