Meta Continual Learning Revisited: Implicitly Enhancing Online Hessian Approximation via Variance Reduction

Authors: Yichen Wu, Long-Kai Huang, Renzhen Wang, Deyu Meng, Ying Wei

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through comprehensive experiments across three datasets and various settings, we consistently observe that VR-MCL outperforms other SOTA methods, which further validates the effectiveness of VR-MCL.
Researcher Affiliation Collaboration 1City University of Hong Kong, 2Tencent AI Lab, 3Xi an Jiaotong University, 4Pazhou Laboratory (Huangpu), 5Nanyang Technological University
Pseudocode Yes For clarity, we present the pseudo-codes of the algorithm in Appendix E. [...] Algorithm 1 The Algorithm of the proposed VR-MCL.
Open Source Code Yes Code is available at https://github.com/Wu Yichen-97/Meta-CL-Revised
Open Datasets Yes To verify the effectiveness of the proposed VR-MCL, we conduct comprehensive experiments on commonly used datasets Seq-CIFAR10, as well as the longer task sequences Seq-CIFAR100 and Seq Tiny Image Net (Buzzega et al., 2020).
Dataset Splits Yes Specifically, the Seq-CIFAR10 dataset comprises 5 tasks, with each task containing 2 classes. In contrast, Seq-CIFAR100 consists of 10 tasks, each with 10 classes, while Seq-Tiny Image Net includes 20 tasks, each encompassing 10 classes. [...] Our evaluation includes the metrics and experimental settings following the previous works on online CL with a single-head (Caccia et al., 2021; Ji et al., 2020; Shim et al., 2021). We choose the final Averaged accuracy (Acc) across all tasks after sequential training on each task as the main metric for comparing approaches. Moreover, under online CL, we use the Averaged Anytime Accuracy (AAA) Caccia et al. (2021) to evaluate the model through the stream tasks.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory, number of machines) used for running the experiments. It mentions using 'reduced Res Net-18' and 'Pc CNN' backbones and discusses training time, implying computational resources, but no explicit hardware specifications.
Software Dependencies No The paper mentions using 'Stochastic Gradient Descent (SGD) optimizer' but does not specify any software frameworks (e.g., PyTorch, TensorFlow) or their version numbers that would be necessary for reproduction.
Experiment Setup Yes For the reduced Res Net18, we set both the batch size and the replay batch size (i.e., the batch size sampled from the memory buffer M) as 32. For the smaller network Pc CNN, we set both the batch size and the replay bath size as 10. The momentum ratio r and the learning rate α of VR-MCL are both set as 0.25 for all experiments. [...] The hyperparameters used for each experimental setting are listed in Table 7.