Optimizing Reusable Knowledge for Continual Learning via Metalearning

Authors: Julio Hurtado, Alain Raymond, Alvaro Soto

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive experiments to analyze the inner workings behind MARK. In particular, we demonstrate that, indeed, MARK is able to accumulate knowledge incrementally as it learns new tasks. Furthermore, in terms of the pieces of knowledge used to solve each task, our experiments confirm that the use of task dependent feature masks results critical to the success of the method. In our experiments, we compare MARK with recent state of the art methods: Hard Attention to the Tasks (HAT) [7], A-GEM [28], Adversarial Continual Learning (ACL) [18], GPM [29], Experience Replay and Sup Sup[30].
Researcher Affiliation Academia Julio Hurtado Department of Computer Science Pontificia Universidad Católica de Chile jahurtado@uc.cl Alain Raymond-Saez Department of Computer Science Pontificia Universidad Católica de Chile afraymon@uc.cl Alvaro Soto Department of Computer Science Pontificia Universidad Católica de Chile asoto@ing.puc.cl
Pseudocode Yes Algorithm 1 describes the training process behind MARK. Algorithm 2: KB-Update
Open Source Code Yes Code is released at: https://github.com/Julious Hurtado/meta-training-setup.
Open Datasets Yes For our experiments, we use two benchmarks used in previous works [18, 8, 26]. The first one is 20-Split CIFAR-100 [27] that consists of splitting CIFAR-100 into 20 tasks, each one with 5 classes. The second one is 20-Split Mini Imagenet [22] that consists of dividing its 100 classes into 20 tasks.
Dataset Splits No The paper mentions using a 'validation batch' in Eq. 3 to compute weights for the KB update, but it does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) for the overall datasets like CIFAR-100 or Mini Imagenet.
Hardware Specification No No specific hardware details (e.g., GPU models, CPU types, memory amounts, or cloud instance types) used for running experiments were mentioned in the paper.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x, CUDA versions). It only mentions the use of SGD as an optimizer.
Experiment Setup Yes In terms of hyperparameters, we use SGD with a learning rate of 0.01 and a batch size of 128. Each task is trained for 50 epochs. To update the KB, we use 10 meta-tasks (K), trained for Einner =40 epochs, each with a learning rate of 0.001. We repeat this training stage 20 times (Eouter).