Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality

Authors: Liyuan Wang, Jingyi Xie, Xingxing Zhang, Mingyi Huang, Hang Su, Jun Zhu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments demonstrate the superior performance of Hi De-Prompt and its robustness to pre-training paradigms in continual learning (e.g., up to 15.01% and 9.61% lead on Split CIFAR-100 and Split Image Net-R, respectively). In this section, we first describe the experimental setups, and then present the experimental results.
Researcher Affiliation Collaboration Liyuan Wang1, Jingyi Xie1, Xingxing Zhang1 , Mingyi Huang1, Hang Su1, Jun Zhu1 Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua-Bosch Joint Center for ML, Tsinghua University, Beijing, China.
Pseudocode Yes Algorithm 1 Training Algorithm of Hi De-Prompt
Open Source Code Yes Our code is available at https://github.com/thu-ml/Hi De-Prompt.
Open Datasets Yes Benchmark: We consider multiple CIL benchmarks that are widely used for prompt-based continual learning [41, 40, 30]. Specifically, Split CIFAR-100 [14] includes 100-class small-scale images, randomly split into 10 incremental tasks of disjoint classes. Split Image Net-R [14] includes 200-class large-scale images that are hard examples of Image Net [29] or newly collected examples of different styles, randomly split into 10 incremental tasks of disjoint classes. 5-Datasets [6] includes CIFAR-10 [14], MNIST [15], Fashion-MNIST [42], SVHN [24] and not MNIST [1] datasets, each treated as an incremental task to evaluate the impact of large inter-task differences. Split CUB-200 [32] includes 200-class fine-grained images of birds, randomly split into 10 incremental tasks of disjoint classes.
Dataset Splits No The paper mentions training sets (D1, ..., DT) and test sets, and uses common benchmarks like Split CIFAR-100 and Split Image Net-R. While grid search for epochs implies the use of a validation set, the paper does not explicitly provide specific percentages, sample counts, or citations to predefined validation splits, nor does it detail a splitting methodology for a validation set.
Hardware Specification Yes Compute: We run all experiments of Split CIFAR-100 on eight Tesla P100-SXM2 GPUs, Split Image Net-R on four NVIDIA A100 GPUs, 5-Datasets on both, and Split CUB-200 on eight NVIDIA Ge Force RTX 3090 GPUs.
Software Dependencies No The paper mentions using an 'Adam optimizer' and a 'pre-trained Vi T-B/16 backbone' but does not specify software dependencies like Python, PyTorch, TensorFlow, or CUDA with their respective version numbers.
Experiment Setup Yes Implementation: We follow similar implementations as previous work [41, 40, 30]. Specifically, we adopt a pre-trained Vi T-B/16 backbone and train with an Adam optimizer (β1 = 0.9, β2 = 0.999), a batch size of 128, and a constant learning rate of 0.005 (except for CODA-Prompt with a cosine-decaying learning rate of 0.001), and grid search for a proper epoch number. The image inputs are resized to 224 224 and normalized to [0, 1]. Please refer to Appendix C for more details.