reproducibilityindex.ai

Optimizing Reusable Knowledge for Continual Learning via Metalearning

Authors: Julio Hurtado, Alain Raymond, Alvaro Soto

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform extensive experiments to analyze the inner workings behind MARK. In particular, we demonstrate that, indeed, MARK is able to accumulate knowledge incrementally as it learns new tasks. Furthermore, in terms of the pieces of knowledge used to solve each task, our experiments conﬁrm that the use of task dependent feature masks results critical to the success of the method. In our experiments, we compare MARK with recent state of the art methods: Hard Attention to the Tasks (HAT) [7], A-GEM [28], Adversarial Continual Learning (ACL) [18], GPM [29], Experience Replay and Sup Sup[30].
Researcher Affiliation	Academia	Julio Hurtado Department of Computer Science Pontiﬁcia Universidad Católica de Chile jahurtado@uc.cl Alain Raymond-Saez Department of Computer Science Pontiﬁcia Universidad Católica de Chile afraymon@uc.cl Alvaro Soto Department of Computer Science Pontiﬁcia Universidad Católica de Chile asoto@ing.puc.cl
Pseudocode	Yes	Algorithm 1 describes the training process behind MARK. Algorithm 2: KB-Update
Open Source Code	Yes	Code is released at: https://github.com/Julious Hurtado/meta-training-setup.
Open Datasets	Yes	For our experiments, we use two benchmarks used in previous works [18, 8, 26]. The ﬁrst one is 20-Split CIFAR-100 [27] that consists of splitting CIFAR-100 into 20 tasks, each one with 5 classes. The second one is 20-Split Mini Imagenet [22] that consists of dividing its 100 classes into 20 tasks.
Dataset Splits	No	The paper mentions using a 'validation batch' in Eq. 3 to compute weights for the KB update, but it does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) for the overall datasets like CIFAR-100 or Mini Imagenet.
Hardware Specification	No	No specific hardware details (e.g., GPU models, CPU types, memory amounts, or cloud instance types) used for running experiments were mentioned in the paper.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x, CUDA versions). It only mentions the use of SGD as an optimizer.
Experiment Setup	Yes	In terms of hyperparameters, we use SGD with a learning rate of 0.01 and a batch size of 128. Each task is trained for 50 epochs. To update the KB, we use 10 meta-tasks (K), trained for Einner =40 epochs, each with a learning rate of 0.001. We repeat this training stage 20 times (Eouter).