Learnability and Algorithm for Continual Learning

Authors: Gyuhak Kim, Changnan Xiao, Tatsuya Konishi, Bing Liu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate its effectiveness. and Based on the theory, a new CIL algorithm, called ROW (Replay, OOD, and WP for CIL), is proposed. Experimental results show that it outperforms existing strong baselines.
Researcher Affiliation Collaboration 1Department of Computer Science, University of Illinois at Chicago. 2Work done at Byte Dance. 3KDDI Research (work done when this author was visiting Bing Liu s group).
Pseudocode No The paper does not contain explicitly labeled pseudocode or algorithm blocks. It describes the training and prediction process in text and diagrams in Figure 1.
Open Source Code Yes The proposed method ROW2 https://github.com/k-gyuhak/CLOOD
Open Datasets Yes We use three popular continual learning benchmark datasets. 1). CIFAR10 (Krizhevsky & Hinton, 2009). ... 2). CIFAR100 (Krizhevsky & Hinton, 2009). ... 3). Tiny-Image Net (Le & Yang, 2015).
Dataset Splits Yes For all the experiments of our system, we find a good set of learning rates and the number of epochs via validation data made of 10% of the training data.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. It only mentions the use of a specific transformer architecture (Dei T-S/16).
Software Dependencies No The paper does not provide specific software dependencies with version numbers. It mentions the use of SGD optimizer, but no programming language, libraries, or frameworks with their respective versions.
Experiment Setup Yes For all the experiments, we use SGD with momentum value of 0.9 with batch size of 64. For C10-5T, we use learning rate 0.005 and train for 20 epochs. For C100-10T and 20T, we train for 40 epochs with learning rate 0.001 and 0.005 for 10T and 20T, respectively. For T-5T and 10T, we use the same learning rate 0.005, but train for 15 and 10 epochs for 5T and 10T, respectively. For fine-tuning WP and OOD head, we use batch size of 32 and use the same learning rate used for training the feature extractor. For fine-tuning WP and TP, we train for 5 epochs and 10 epochs, respectively.