Prompt Gradient Projection for Continual Learning

Authors: Jingyang Qiao, zhizhong zhang, Xin Tan, Chengwei Chen, Yanyun Qu, Yong Peng, Yuan Xie

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our method on diverse datasets and experiments demonstrate the efficiency of reducing forgetting both in class incremental, online class incremental, and task incremental settings.
Researcher Affiliation Academia Jingyang Qiao1 , Zhizhong Zhang1 , Xin Tan1, Chengwei Chen2, Yanyun Qu3, Yong Peng4, Yuan Xie1( ) 1East China Normal University, 2The Navy Military Medical University, 3Xiamen University, 4Central South University
Pseudocode Yes Algorithm 1: Prompt Gradient Projection For L2P (Training phase) Algorithm 2: Prompt Gradient Projection For L2P (Testing phase)
Open Source Code Yes The code is available at https://github.com/Jingyang Qiao/prompt-gradient-projection.
Open Datasets Yes We evaluate our method on 1) 10/20-Split-CIFAR100 (Krizhevsky et al., 2009), constructed by splitting the 100 classes into 10 tasks/20 tasks. 2) 10-Split-Tiny Image Net (Abai & Rajmalwar, 2019), constructed by splitting the 200 classes into 10 tasks. 3) 10-Split-Image Net-R (Hendrycks et al., 2021), constructed by splitting the 200 classes into 10 tasks.
Dataset Splits No The paper specifies datasets like '10/20-Split-CIFAR100' which imply task-based splits, and mentions 'training' in Appendix G, but does not explicitly detail the percentages or counts for training, validation, and test splits within these tasks or for the overall datasets.
Hardware Specification Yes We train and test on one A6000-48GB GPU for baselines and our method.
Software Dependencies No The paper mentions using specific models like 'Vi T B/16' and an 'Adam optimizer', but it does not specify software versions (e.g., Python 3.x, PyTorch 1.x) for reproducibility.
Experiment Setup Yes We set the Adam optimizer with β1 = 0.9 and β2 = 0.999. For hyperparameters, in L2P-PGP, we set ϵ = 0.50 for extraction of prompt gradient projection matrix and ϵ = 0.97 for key gradient projection matrix. While in Dual Prompt-PGP, we set ϵ = 0.50 for extraction of prompt gradient projection matrix. ... we both train the network for 5 epochs with batch size of 16 and prompt length is set at 5, while we both set epochs as 50, batch size as 16, and prompt length as 30 for 10-Split-Image Net-R.