AFEC: Active Forgetting of Negative Transfer in Continual Learning

Authors: Liyuan Wang, Mingtian Zhang, Zhongfan Jia, Qian Li, Chenglong Bao, Kaisheng Ma, Jun Zhu, Yi Zhong

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate AFEC on a variety of continual learning benchmarks, including CIFAR-10 regression tasks, visual classification tasks and Atari reinforcement tasks, where AFEC effectively improves the learning of new tasks and achieves the state-of-the-art performance in a plug-and-play way.
Researcher Affiliation Academia 1School of Life Sciences, IDG/Mc Govern Institute for Brain Research, Tsinghua University. 2Tsinghua-Peking Center for Life Sciences. 3Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua University. 4AI Center, University College London. 5IIIS, Tsinghua University. 6Yau Mathematical Sciences Center, Tsinghua University.
Pseudocode Yes We discuss it in Appendix B.4 with a pseudocode. (from main text) and Algorithm 1: AFEC for Continual Learning (from Appendix B.4).
Open Source Code Yes Our code is included in supplementary materials.
Open Datasets Yes Dataset: We evaluate continual learning on a variety of benchmark datasets for visual classification, including CIFAR-100, CUB-200-2011 and Image Net-100. CIFAR-100 [13] contains 100-class colored images of the size 32 32... CUB-200-2011 [31] is a large-scale dataset... Image Net-100 [9] is a subset of i ILSVRC-2012 [23]...
Dataset Splits No The paper mentions training and testing samples for CIFAR-10 ('50,000 training samples and 10,000 testing samples') and CUB-200-2011 ('30 images per class for training while the rest for testing'), but does not explicitly specify validation dataset splits.
Hardware Specification Yes All the experiments are conducted on NVIDIA GeForce RTX 2080 Ti GPUs.
Software Dependencies No The paper mentions algorithms and frameworks (e.g., PPO), but does not explicitly list specific software libraries or dependencies with version numbers (e.g., 'PyTorch 1.9', 'CUDA 11.1').
Experiment Setup Yes We follow the implementation of [10] to sequentially learn eight randomly selected Atari games. Specifically, we applies a CNN architecture consisting of 3 convolution layers with 2 fully connected layers and identical PPO [25] for all the methods (detailed in Appendix G). (from Section 4.3) and Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Sec. 4 and Appendix C, F and G. (from Ethics Review Checklist, point 3.b). Appendix C and G indeed provide hyperparameter details.