reproducibilityindex.ai

Prompt Gradient Projection for Continual Learning

Authors: Jingyang Qiao, zhizhong zhang, Xin Tan, Chengwei Chen, Yanyun Qu, Yong Peng, Yuan Xie

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our method on diverse datasets and experiments demonstrate the efficiency of reducing forgetting both in class incremental, online class incremental, and task incremental settings.
Researcher Affiliation	Academia	Jingyang Qiao1 , Zhizhong Zhang1 , Xin Tan1, Chengwei Chen2, Yanyun Qu3, Yong Peng4, Yuan Xie1( ) 1East China Normal University, 2The Navy Military Medical University, 3Xiamen University, 4Central South University
Pseudocode	Yes	Algorithm 1: Prompt Gradient Projection For L2P (Training phase) Algorithm 2: Prompt Gradient Projection For L2P (Testing phase)
Open Source Code	Yes	The code is available at https://github.com/Jingyang Qiao/prompt-gradient-projection.
Open Datasets	Yes	We evaluate our method on 1) 10/20-Split-CIFAR100 (Krizhevsky et al., 2009), constructed by splitting the 100 classes into 10 tasks/20 tasks. 2) 10-Split-Tiny Image Net (Abai & Rajmalwar, 2019), constructed by splitting the 200 classes into 10 tasks. 3) 10-Split-Image Net-R (Hendrycks et al., 2021), constructed by splitting the 200 classes into 10 tasks.
Dataset Splits	No	The paper specifies datasets like '10/20-Split-CIFAR100' which imply task-based splits, and mentions 'training' in Appendix G, but does not explicitly detail the percentages or counts for training, validation, and test splits within these tasks or for the overall datasets.
Hardware Specification	Yes	We train and test on one A6000-48GB GPU for baselines and our method.
Software Dependencies	No	The paper mentions using specific models like 'Vi T B/16' and an 'Adam optimizer', but it does not specify software versions (e.g., Python 3.x, PyTorch 1.x) for reproducibility.
Experiment Setup	Yes	We set the Adam optimizer with β1 = 0.9 and β2 = 0.999. For hyperparameters, in L2P-PGP, we set ϵ = 0.50 for extraction of prompt gradient projection matrix and ϵ = 0.97 for key gradient projection matrix. While in Dual Prompt-PGP, we set ϵ = 0.50 for extraction of prompt gradient projection matrix. ... we both train the network for 5 epochs with batch size of 16 and prompt length is set at 5, while we both set epochs as 50, batch size as 16, and prompt length as 30 for 10-Split-Image Net-R.