Continual Learning with Adaptive Weights (CLAW)
Authors: Tameem Adel, Han Zhao, Richard E. Turner
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that CLAW achieves state-of-the-art performance on six benchmarks in terms of overall continual learning performance, as measured by classification accuracy, and in terms of addressing catastrophic forgetting. |
| Researcher Affiliation | Collaboration | Tameem Adel Department of Engineering, University of Cambridge tah47@cam.ac.uk; Han Zhao Carnegie Mellon University han.zhao@cs.cmu.edu; Richard E. Turner Department of Engineering, University of Cambridge Microsoft Research ret26@cam.ac.uk |
| Pseudocode | Yes | Algorithm 1 Continual Learning with Adaptive Weights (CLAW) |
| Open Source Code | No | The paper does not include any explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | The datasets in use are: MNIST (Le Cun et al., 1998), not MNIST (Butalov, 2011), Fashion-MNIST (Xiao et al., 2017), Omniglot (Lake et al., 2011) and CIFAR-100 (Krizhevsky & Hinton, 2009). |
| Dataset Splits | Yes | Data is randomly split into three partitions, training, validation and test. A portion of 60% of the data is reserved for training, 20% for validation and 20% for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types. The mention of an 'Nvidia GPU grant' in acknowledgments is too general. |
| Software Dependencies | No | The paper mentions 'Adam' as an optimizer but does not specify any software versions for programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The minibatch size is 128 for Split MNIST and 256 for all the other experiments. Adam (Kingma & Ba, 2015) is the optimiser used in the 6 experiments with η = 0.001, β1 = 0.9 and β2 = 0.999. Number of epochs required per task to reach a saturation level for CLAW (and the bulk of the methods in comparison) was 10 epochs for all experiments except for Omniglot and CIFAR-100 (15 epochs). Used values of ω1 and ω2 are 0.05 and 0.02, respectively. For Omniglot, we used a network similar to the one used in (Schwarz et al., 2018), which consists of 4 blocks of 3 3 convolutions with 64 filters, followed by a Re LU and a 2 2 max-pooling. The same CNN is used for CIFAR-100. |