Continual learning with hypernetworks

Authors: Johannes von Oswald, Christian Henning, Benjamin F. Grewe, João Sacramento

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results show that task-conditioned hypernetworks do not suffer from catastrophic forgetting on a set of standard CL benchmarks. Remarkably, they are capable of retaining memories with practically no decrease in performance, when presented with very long sequences of tasks. We evaluate our method on a set of standard image classification benchmarks on the MNIST, CIFAR10 and CIFAR-100 public datasets.
Researcher Affiliation Academia Institute of Neuroinformatics University of Zürich and ETH Zürich Zürich, Switzerland
Pseudocode No The paper includes mathematical equations and descriptions of procedures, but no explicitly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes Source code is available under https://github.com/chrhenning/hypercl. (Footnote 1, page 5)
Open Datasets Yes We evaluate our method on a set of standard image classification benchmarks on the MNIST, CIFAR10 and CIFAR-100 public datasets
Dataset Splits No The paper mentions using standard datasets and settings from previous work (e.g., 'For the experiments on the MNIST dataset we model the target network as a fully-connected network and set all hyperparameters after van de Ven & Tolias (2019)'), but does not explicitly provide the train/validation/test dataset splits (percentages or counts) used for their own experiments in the main text or appendices for reproducibility. The mention of a 'held out validation set' is in the context of the HAT algorithm, not their own.
Hardware Specification Yes All experiments are conducted using 16 NVIDIA GeForce RTX 2080 TI graphics cards.
Software Dependencies No The paper mentions 'Adam optimizer' and 'Py Torch options', but does not provide specific version numbers for these or other software libraries used, which are required for full reproducibility. For example, 'We train each task for 4000 iterations using the Adam optimizer with a learning rate of 0.01 (and otherwise default Py Torch options) and a batch size of 32.'.
Experiment Setup Yes We train each task for 4000 iterations using the Adam optimizer with a learning rate of 0.01 (and otherwise default Py Torch options) and a batch size of 32. Below, we report the specifications for our automatic hyperparameter search... Embedding sizes (for e and c): 8, 12, 24, 36, 62, 96, 128 βoutput: 0.0005, 0.001, 0.005, 0.01, 0.005, 0.1, 0.5, 1.0 Hypernetwork transfer functions: linear, Re LU, ELU, Leaky-Re LU