Preference-grounded Token-level Guidance for Language Model Fine-tuning
Authors: Shentao Yang, Shujian Zhang, Congying Xia, Yihao Feng, Caiming Xiong, Mingyuan Zhou
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, our method performs competitively on two distinct representative LM tasks discrete-prompt generation and text summarization. ... We evaluate our framework on two distinct representative LM tasks: generating discrete text prompts for few-shot text classification and text summarization. On both tasks, our method exhibits competitive performance. |
| Researcher Affiliation | Collaboration | 1The University of Texas at Austin 2Salesforce Research |
| Pseudocode | Yes | Algorithm 1 A learning routine for the preference-grounded token-level reward function rϕ. ... Algorithm 2 An alternate-learning process for the reward function rϕ and the LM πθ. |
| Open Source Code | Yes | Source codes are released at https://github.com/Shentao-YANG/Preference_Grounded_Guidance. |
| Open Datasets | Yes | We test on three popular few-shot datasets in prior work [e.g., 77, 78]: two sentiment binary-classification datasets SST-2 [79, 80] and Yelp Polarity [81], and a topic four-way-classification dataset AG News [81, 82]. |
| Dataset Splits | Yes | We also adopt the standard few-shot setting [76], where both the training and validation sets have 16 (o, y)-pairs per class. |
| Hardware Specification | Yes | The experiments are conducted on NVIDIA Ge Force RTX 3090 and NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions using the 'Hugging Face library [73]' but does not specify a version number for it or any other key software dependencies. |
| Experiment Setup | Yes | Additionally, we list the important hyperparameters for training our reward model in Table 13, and important hyperparameters for training our LM in Table 14. |