Preference-grounded Token-level Guidance for Language Model Fine-tuning

Authors: Shentao Yang, Shujian Zhang, Congying Xia, Yihao Feng, Caiming Xiong, Mingyuan Zhou

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, our method performs competitively on two distinct representative LM tasks discrete-prompt generation and text summarization. ... We evaluate our framework on two distinct representative LM tasks: generating discrete text prompts for few-shot text classification and text summarization. On both tasks, our method exhibits competitive performance.
Researcher Affiliation Collaboration 1The University of Texas at Austin 2Salesforce Research
Pseudocode Yes Algorithm 1 A learning routine for the preference-grounded token-level reward function rϕ. ... Algorithm 2 An alternate-learning process for the reward function rϕ and the LM πθ.
Open Source Code Yes Source codes are released at https://github.com/Shentao-YANG/Preference_Grounded_Guidance.
Open Datasets Yes We test on three popular few-shot datasets in prior work [e.g., 77, 78]: two sentiment binary-classification datasets SST-2 [79, 80] and Yelp Polarity [81], and a topic four-way-classification dataset AG News [81, 82].
Dataset Splits Yes We also adopt the standard few-shot setting [76], where both the training and validation sets have 16 (o, y)-pairs per class.
Hardware Specification Yes The experiments are conducted on NVIDIA Ge Force RTX 3090 and NVIDIA A100 GPUs.
Software Dependencies No The paper mentions using the 'Hugging Face library [73]' but does not specify a version number for it or any other key software dependencies.
Experiment Setup Yes Additionally, we list the important hyperparameters for training our reward model in Table 13, and important hyperparameters for training our LM in Table 14.