Prompt Gradient Projection for Continual Learning
Authors: Jingyang Qiao, zhizhong zhang, Xin Tan, Chengwei Chen, Yanyun Qu, Yong Peng, Yuan Xie
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our method on diverse datasets and experiments demonstrate the efficiency of reducing forgetting both in class incremental, online class incremental, and task incremental settings. |
| Researcher Affiliation | Academia | Jingyang Qiao1 , Zhizhong Zhang1 , Xin Tan1, Chengwei Chen2, Yanyun Qu3, Yong Peng4, Yuan Xie1( ) 1East China Normal University, 2The Navy Military Medical University, 3Xiamen University, 4Central South University |
| Pseudocode | Yes | Algorithm 1: Prompt Gradient Projection For L2P (Training phase) Algorithm 2: Prompt Gradient Projection For L2P (Testing phase) |
| Open Source Code | Yes | The code is available at https://github.com/Jingyang Qiao/prompt-gradient-projection. |
| Open Datasets | Yes | We evaluate our method on 1) 10/20-Split-CIFAR100 (Krizhevsky et al., 2009), constructed by splitting the 100 classes into 10 tasks/20 tasks. 2) 10-Split-Tiny Image Net (Abai & Rajmalwar, 2019), constructed by splitting the 200 classes into 10 tasks. 3) 10-Split-Image Net-R (Hendrycks et al., 2021), constructed by splitting the 200 classes into 10 tasks. |
| Dataset Splits | No | The paper specifies datasets like '10/20-Split-CIFAR100' which imply task-based splits, and mentions 'training' in Appendix G, but does not explicitly detail the percentages or counts for training, validation, and test splits within these tasks or for the overall datasets. |
| Hardware Specification | Yes | We train and test on one A6000-48GB GPU for baselines and our method. |
| Software Dependencies | No | The paper mentions using specific models like 'Vi T B/16' and an 'Adam optimizer', but it does not specify software versions (e.g., Python 3.x, PyTorch 1.x) for reproducibility. |
| Experiment Setup | Yes | We set the Adam optimizer with β1 = 0.9 and β2 = 0.999. For hyperparameters, in L2P-PGP, we set ϵ = 0.50 for extraction of prompt gradient projection matrix and ϵ = 0.97 for key gradient projection matrix. While in Dual Prompt-PGP, we set ϵ = 0.50 for extraction of prompt gradient projection matrix. ... we both train the network for 5 epochs with batch size of 16 and prompt length is set at 5, while we both set epochs as 50, batch size as 16, and prompt length as 30 for 10-Split-Image Net-R. |