Continual Learning by Using Information of Each Class Holistically

Authors: Wenpeng Hu, Qi Qin, Mengyu Wang, Jinwen Ma, Bing Liu7797-7805

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation shows that PCL markedly outperforms the state-of-the-art baselines for one or more classes per task.
Researcher Affiliation Academia 1 Department of Information Science, School of Mathematical Sciences, Peking University 2 Center for Data Science, AAIS, Peking University 3 Wangxuan Institute of Computer Technology, Peking University
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes We now evaluate the proposed PCL technique (the code can be found here3) and compare it with both classic and the latest baselines" and Footnote 3 "https://github.com/morning-dews/PCL"
Open Datasets Yes We use four benchmark image classification datasets and two text classification datasets in our experiments: MNIST (Le Cun, Cortes, and Burges 1998), EMNIST-47 (Cohen et al. 2017), CIFAR10 and CIFAR100 (Krizhevsky and Hinton 2009) for images; 20news and DBPedia for text.
Dataset Splits Yes We randomly select 10% of the examples from the training set of each dataset as the validation set to tune the hyper-parameters.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using SGD as an optimizer and various baselines' code, but it does not provide specific version numbers for ancillary software components or libraries required for reproduction.
Experiment Setup Yes For training, we use SGD with moment as the optimizer (learning rate = 0.1). We run each experiment five times. For each run of PCL or a baseline, we execute 500 epochs and use the maximum accuracy as the final result of the run. [...] PCL has 3 parameters that need tuning: λ and n in H-reg (Sec. 3.1) and η for transfer (Sec. 3.2). [...] After tuning, we get the best hyperparameters of λ = 0.5 and n = 12. For η, different data have different values, 0.001 for MNIST and EMNIST-47, 0.005 for CIFAR10 and DBPedia, 0.01 for CIFAR100 and 20news.