Continual Learning with Adaptive Weights (CLAW)

Authors: Tameem Adel, Han Zhao, Richard E. Turner

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that CLAW achieves state-of-the-art performance on six benchmarks in terms of overall continual learning performance, as measured by classification accuracy, and in terms of addressing catastrophic forgetting.
Researcher Affiliation Collaboration Tameem Adel Department of Engineering, University of Cambridge tah47@cam.ac.uk; Han Zhao Carnegie Mellon University han.zhao@cs.cmu.edu; Richard E. Turner Department of Engineering, University of Cambridge Microsoft Research ret26@cam.ac.uk
Pseudocode Yes Algorithm 1 Continual Learning with Adaptive Weights (CLAW)
Open Source Code No The paper does not include any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes The datasets in use are: MNIST (Le Cun et al., 1998), not MNIST (Butalov, 2011), Fashion-MNIST (Xiao et al., 2017), Omniglot (Lake et al., 2011) and CIFAR-100 (Krizhevsky & Hinton, 2009).
Dataset Splits Yes Data is randomly split into three partitions, training, validation and test. A portion of 60% of the data is reserved for training, 20% for validation and 20% for testing.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types. The mention of an 'Nvidia GPU grant' in acknowledgments is too general.
Software Dependencies No The paper mentions 'Adam' as an optimizer but does not specify any software versions for programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The minibatch size is 128 for Split MNIST and 256 for all the other experiments. Adam (Kingma & Ba, 2015) is the optimiser used in the 6 experiments with η = 0.001, β1 = 0.9 and β2 = 0.999. Number of epochs required per task to reach a saturation level for CLAW (and the bulk of the methods in comparison) was 10 epochs for all experiments except for Omniglot and CIFAR-100 (15 epochs). Used values of ω1 and ω2 are 0.05 and 0.02, respectively. For Omniglot, we used a network similar to the one used in (Schwarz et al., 2018), which consists of 4 blocks of 3 3 convolutions with 64 filters, followed by a Re LU and a 2 2 max-pooling. The same CNN is used for CIFAR-100.