Concentrate Attention: Towards Domain-Generalizable Prompt Optimization for Language Models
Authors: Chengzhengxu Li, Xiaoming Liu, Zhaohan Zhang, Yichen Wang, Chen Liu, Yu Lan, Chao Shen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our idea improves comparison prompt optimization methods by 1.42% for soft prompt generalization and 2.16% for hard prompt generalization in accuracy on the multi-source domain generalization setting, while maintaining satisfying in-domain performance. |
| Researcher Affiliation | Academia | 1Faculty of Electronic and Information Engineering, Xi an Jiaotong University 2Queen Mary University of London, London, UK 3University of Chicago |
| Pseudocode | Yes | Algorithm 1 shows the detailed process of concentrative soft prompt optimization in 4.1. It also reveals that our method can be widely applied to different soft prompt optimization methods to improve their domain generalization capabilities. Algorithm 2. Concentrative Hard Prompt Optimization |
| Open Source Code | Yes | Our codes are available at https://github.com/czx-li/Concentrate-Attention |
| Open Datasets | Yes | We select the SST-2 Socher et al. [2013], MR Pang and Lee [2005], and CR Hu and Liu [2004] datasets for sentiment classification, and the WNLI, QNLI, and RTE datasets from GLUE Wang et al. [2018] for NLI tasks. |
| Dataset Splits | Yes | For all tasks, we randomly select 32 samples from each source domain as the training set to simulate MFDG setting. We use the same approach to build the validation set and ensure that the number of labels in the training and validation sets is balanced. |
| Hardware Specification | Yes | All experimental results are the average results of 10 different random seeds on a single NVIDIA A100 GPU. |
| Software Dependencies | No | I could not find specific version numbers for key software components or libraries, only general mentions of models (e.g., RoBERTa-Large) and optimizers (Adam W). |
| Experiment Setup | Yes | For Soft Prompt Tuning, we replace the Manual Prompt tokens with five soft tokens in the same positions, and optimize them using Adam W Loshchilov and Hutter [2017] optimizer with learning rate 2 × 10−5 and batch size 32 for 300 epochs. For Prefix Tuning and P-Tuning v2, we apply the Adam W optimizer with a learning rate of 2 × 10−4 and train for 100 epochs. The mini batch size is 8 and prompt length is set as 10. The setting of hard prompt optimization baselines (In-Context Demo, DP2O, GrIPS and RLPrompt) follows Li et al. [2024]. |