Continual learning with hypernetworks
Authors: Johannes von Oswald, Christian Henning, Benjamin F. Grewe, João Sacramento
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results show that task-conditioned hypernetworks do not suffer from catastrophic forgetting on a set of standard CL benchmarks. Remarkably, they are capable of retaining memories with practically no decrease in performance, when presented with very long sequences of tasks. We evaluate our method on a set of standard image classification benchmarks on the MNIST, CIFAR10 and CIFAR-100 public datasets. |
| Researcher Affiliation | Academia | Institute of Neuroinformatics University of Zürich and ETH Zürich Zürich, Switzerland |
| Pseudocode | No | The paper includes mathematical equations and descriptions of procedures, but no explicitly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Source code is available under https://github.com/chrhenning/hypercl. (Footnote 1, page 5) |
| Open Datasets | Yes | We evaluate our method on a set of standard image classification benchmarks on the MNIST, CIFAR10 and CIFAR-100 public datasets |
| Dataset Splits | No | The paper mentions using standard datasets and settings from previous work (e.g., 'For the experiments on the MNIST dataset we model the target network as a fully-connected network and set all hyperparameters after van de Ven & Tolias (2019)'), but does not explicitly provide the train/validation/test dataset splits (percentages or counts) used for their own experiments in the main text or appendices for reproducibility. The mention of a 'held out validation set' is in the context of the HAT algorithm, not their own. |
| Hardware Specification | Yes | All experiments are conducted using 16 NVIDIA GeForce RTX 2080 TI graphics cards. |
| Software Dependencies | No | The paper mentions 'Adam optimizer' and 'Py Torch options', but does not provide specific version numbers for these or other software libraries used, which are required for full reproducibility. For example, 'We train each task for 4000 iterations using the Adam optimizer with a learning rate of 0.01 (and otherwise default Py Torch options) and a batch size of 32.'. |
| Experiment Setup | Yes | We train each task for 4000 iterations using the Adam optimizer with a learning rate of 0.01 (and otherwise default Py Torch options) and a batch size of 32. Below, we report the specifications for our automatic hyperparameter search... Embedding sizes (for e and c): 8, 12, 24, 36, 62, 96, 128 βoutput: 0.0005, 0.001, 0.005, 0.01, 0.005, 0.1, 0.5, 1.0 Hypernetwork transfer functions: linear, Re LU, ELU, Leaky-Re LU |