Continual Learning with Recursive Gradient Optimization

Authors: Hao Liu, Huaping Liu

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that RGO has significantly better performance on popular continual classification benchmarks when compared to the baselines and achieves new state-of-the-art performance on 20-split-CIFAR100 (82.22%) and 20-split-mini Image Net (72.63%).
Researcher Affiliation Academia Hao Liu Department of Computer Science Tsinghua University Beijing, China hao-liu20@mails.tsinghua.edu.cn Huaping Liu Department of Computer Science Tsinghua University Beijing, China hpliu@tsinghua.edu.cn
Pseudocode Yes Algorithm 1 Learning Algorithm of Recursive Gradient Optimization
Open Source Code Yes We give the reproducible source code in the supplementary materials, and introduce the implementation of the baseline method in Appendix C.1.
Open Datasets Yes Permuted MNIST (Goodfellow et al., 2014; Kirkpatrick et al., 2017) and Rotated MNIST (Chaudhry et al., 2020) are variants of MNIST dataset of handwritten digits (Le Cun, 1998)... Split-CIFAR100 (Zenke et al., 2017)... Split mini Image Net, introduced by (Chaudhry et al., 2020), applies a similar division on a subset of the original Image Net (Russakovsky et al., 2015) dataset.
Dataset Splits No The paper mentions training, testing, and task divisions but does not explicitly detail a separate validation dataset split.
Hardware Specification Yes All experiments of our method are completed in several hours with 4 pieces of Nvidia-2080Ti GPUs.
Software Dependencies Yes In Python3.6 and Tensor Flow1.4, all results can be reproduced.
Experiment Setup Yes Architectures and training details: ... MNIST variants are trained 1000 steps while CIFAR and mini Image Net are trained 2000 steps. Batchsize is set at 10 for all tasks. ... The learning rates of all baselines are generated by hyperparameter search in [0.003,0.01,0.03,0.1,0.3,1]... Recursive Gradient Optimization(Ours) learningrate: 0.1(MNIST), 0.03(CIFAR100, mini Image Net 2000steps), 0.01(mini Image Net 20epochs). All experiments are trained 5 times with 20 epoches. Learning rate is set at 0.03 and 0.01 for CIFAR and mini Image Net respectively.