Learnability and Algorithm for Continual Learning
Authors: Gyuhak Kim, Changnan Xiao, Tatsuya Konishi, Bing Liu
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate its effectiveness. and Based on the theory, a new CIL algorithm, called ROW (Replay, OOD, and WP for CIL), is proposed. Experimental results show that it outperforms existing strong baselines. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, University of Illinois at Chicago. 2Work done at Byte Dance. 3KDDI Research (work done when this author was visiting Bing Liu s group). |
| Pseudocode | No | The paper does not contain explicitly labeled pseudocode or algorithm blocks. It describes the training and prediction process in text and diagrams in Figure 1. |
| Open Source Code | Yes | The proposed method ROW2 https://github.com/k-gyuhak/CLOOD |
| Open Datasets | Yes | We use three popular continual learning benchmark datasets. 1). CIFAR10 (Krizhevsky & Hinton, 2009). ... 2). CIFAR100 (Krizhevsky & Hinton, 2009). ... 3). Tiny-Image Net (Le & Yang, 2015). |
| Dataset Splits | Yes | For all the experiments of our system, we find a good set of learning rates and the number of epochs via validation data made of 10% of the training data. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. It only mentions the use of a specific transformer architecture (Dei T-S/16). |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. It mentions the use of SGD optimizer, but no programming language, libraries, or frameworks with their respective versions. |
| Experiment Setup | Yes | For all the experiments, we use SGD with momentum value of 0.9 with batch size of 64. For C10-5T, we use learning rate 0.005 and train for 20 epochs. For C100-10T and 20T, we train for 40 epochs with learning rate 0.001 and 0.005 for 10T and 20T, respectively. For T-5T and 10T, we use the same learning rate 0.005, but train for 15 and 10 epochs for 5T and 10T, respectively. For fine-tuning WP and OOD head, we use batch size of 32 and use the same learning rate used for training the feature extractor. For fine-tuning WP and TP, we train for 5 epochs and 10 epochs, respectively. |