Meta Continual Learning Revisited: Implicitly Enhancing Online Hessian Approximation via Variance Reduction
Authors: Yichen Wu, Long-Kai Huang, Renzhen Wang, Deyu Meng, Ying Wei
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through comprehensive experiments across three datasets and various settings, we consistently observe that VR-MCL outperforms other SOTA methods, which further validates the effectiveness of VR-MCL. |
| Researcher Affiliation | Collaboration | 1City University of Hong Kong, 2Tencent AI Lab, 3Xi an Jiaotong University, 4Pazhou Laboratory (Huangpu), 5Nanyang Technological University |
| Pseudocode | Yes | For clarity, we present the pseudo-codes of the algorithm in Appendix E. [...] Algorithm 1 The Algorithm of the proposed VR-MCL. |
| Open Source Code | Yes | Code is available at https://github.com/Wu Yichen-97/Meta-CL-Revised |
| Open Datasets | Yes | To verify the effectiveness of the proposed VR-MCL, we conduct comprehensive experiments on commonly used datasets Seq-CIFAR10, as well as the longer task sequences Seq-CIFAR100 and Seq Tiny Image Net (Buzzega et al., 2020). |
| Dataset Splits | Yes | Specifically, the Seq-CIFAR10 dataset comprises 5 tasks, with each task containing 2 classes. In contrast, Seq-CIFAR100 consists of 10 tasks, each with 10 classes, while Seq-Tiny Image Net includes 20 tasks, each encompassing 10 classes. [...] Our evaluation includes the metrics and experimental settings following the previous works on online CL with a single-head (Caccia et al., 2021; Ji et al., 2020; Shim et al., 2021). We choose the final Averaged accuracy (Acc) across all tasks after sequential training on each task as the main metric for comparing approaches. Moreover, under online CL, we use the Averaged Anytime Accuracy (AAA) Caccia et al. (2021) to evaluate the model through the stream tasks. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory, number of machines) used for running the experiments. It mentions using 'reduced Res Net-18' and 'Pc CNN' backbones and discusses training time, implying computational resources, but no explicit hardware specifications. |
| Software Dependencies | No | The paper mentions using 'Stochastic Gradient Descent (SGD) optimizer' but does not specify any software frameworks (e.g., PyTorch, TensorFlow) or their version numbers that would be necessary for reproduction. |
| Experiment Setup | Yes | For the reduced Res Net18, we set both the batch size and the replay batch size (i.e., the batch size sampled from the memory buffer M) as 32. For the smaller network Pc CNN, we set both the batch size and the replay bath size as 10. The momentum ratio r and the learning rate α of VR-MCL are both set as 0.25 for all experiments. [...] The hyperparameters used for each experimental setting are listed in Table 7. |