Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality
Authors: Liyuan Wang, Jingyi Xie, Xingxing Zhang, Mingyi Huang, Hang Su, Jun Zhu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments demonstrate the superior performance of Hi De-Prompt and its robustness to pre-training paradigms in continual learning (e.g., up to 15.01% and 9.61% lead on Split CIFAR-100 and Split Image Net-R, respectively). In this section, we first describe the experimental setups, and then present the experimental results. |
| Researcher Affiliation | Collaboration | Liyuan Wang1, Jingyi Xie1, Xingxing Zhang1 , Mingyi Huang1, Hang Su1, Jun Zhu1 Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua-Bosch Joint Center for ML, Tsinghua University, Beijing, China. |
| Pseudocode | Yes | Algorithm 1 Training Algorithm of Hi De-Prompt |
| Open Source Code | Yes | Our code is available at https://github.com/thu-ml/Hi De-Prompt. |
| Open Datasets | Yes | Benchmark: We consider multiple CIL benchmarks that are widely used for prompt-based continual learning [41, 40, 30]. Specifically, Split CIFAR-100 [14] includes 100-class small-scale images, randomly split into 10 incremental tasks of disjoint classes. Split Image Net-R [14] includes 200-class large-scale images that are hard examples of Image Net [29] or newly collected examples of different styles, randomly split into 10 incremental tasks of disjoint classes. 5-Datasets [6] includes CIFAR-10 [14], MNIST [15], Fashion-MNIST [42], SVHN [24] and not MNIST [1] datasets, each treated as an incremental task to evaluate the impact of large inter-task differences. Split CUB-200 [32] includes 200-class fine-grained images of birds, randomly split into 10 incremental tasks of disjoint classes. |
| Dataset Splits | No | The paper mentions training sets (D1, ..., DT) and test sets, and uses common benchmarks like Split CIFAR-100 and Split Image Net-R. While grid search for epochs implies the use of a validation set, the paper does not explicitly provide specific percentages, sample counts, or citations to predefined validation splits, nor does it detail a splitting methodology for a validation set. |
| Hardware Specification | Yes | Compute: We run all experiments of Split CIFAR-100 on eight Tesla P100-SXM2 GPUs, Split Image Net-R on four NVIDIA A100 GPUs, 5-Datasets on both, and Split CUB-200 on eight NVIDIA Ge Force RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions using an 'Adam optimizer' and a 'pre-trained Vi T-B/16 backbone' but does not specify software dependencies like Python, PyTorch, TensorFlow, or CUDA with their respective version numbers. |
| Experiment Setup | Yes | Implementation: We follow similar implementations as previous work [41, 40, 30]. Specifically, we adopt a pre-trained Vi T-B/16 backbone and train with an Adam optimizer (β1 = 0.9, β2 = 0.999), a batch size of 128, and a constant learning rate of 0.005 (except for CODA-Prompt with a cosine-decaying learning rate of 0.001), and grid search for a proper epoch number. The image inputs are resized to 224 224 and normalized to [0, 1]. Please refer to Appendix C for more details. |