AFEC: Active Forgetting of Negative Transfer in Continual Learning
Authors: Liyuan Wang, Mingtian Zhang, Zhongfan Jia, Qian Li, Chenglong Bao, Kaisheng Ma, Jun Zhu, Yi Zhong
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively evaluate AFEC on a variety of continual learning benchmarks, including CIFAR-10 regression tasks, visual classification tasks and Atari reinforcement tasks, where AFEC effectively improves the learning of new tasks and achieves the state-of-the-art performance in a plug-and-play way. |
| Researcher Affiliation | Academia | 1School of Life Sciences, IDG/Mc Govern Institute for Brain Research, Tsinghua University. 2Tsinghua-Peking Center for Life Sciences. 3Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua University. 4AI Center, University College London. 5IIIS, Tsinghua University. 6Yau Mathematical Sciences Center, Tsinghua University. |
| Pseudocode | Yes | We discuss it in Appendix B.4 with a pseudocode. (from main text) and Algorithm 1: AFEC for Continual Learning (from Appendix B.4). |
| Open Source Code | Yes | Our code is included in supplementary materials. |
| Open Datasets | Yes | Dataset: We evaluate continual learning on a variety of benchmark datasets for visual classification, including CIFAR-100, CUB-200-2011 and Image Net-100. CIFAR-100 [13] contains 100-class colored images of the size 32 32... CUB-200-2011 [31] is a large-scale dataset... Image Net-100 [9] is a subset of i ILSVRC-2012 [23]... |
| Dataset Splits | No | The paper mentions training and testing samples for CIFAR-10 ('50,000 training samples and 10,000 testing samples') and CUB-200-2011 ('30 images per class for training while the rest for testing'), but does not explicitly specify validation dataset splits. |
| Hardware Specification | Yes | All the experiments are conducted on NVIDIA GeForce RTX 2080 Ti GPUs. |
| Software Dependencies | No | The paper mentions algorithms and frameworks (e.g., PPO), but does not explicitly list specific software libraries or dependencies with version numbers (e.g., 'PyTorch 1.9', 'CUDA 11.1'). |
| Experiment Setup | Yes | We follow the implementation of [10] to sequentially learn eight randomly selected Atari games. Specifically, we applies a CNN architecture consisting of 3 convolution layers with 2 fully connected layers and identical PPO [25] for all the methods (detailed in Appendix G). (from Section 4.3) and Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Sec. 4 and Appendix C, F and G. (from Ethics Review Checklist, point 3.b). Appendix C and G indeed provide hyperparameter details. |