reproducibilityindex.ai

Preference-grounded Token-level Guidance for Language Model Fine-tuning

Authors: Shentao Yang, Shujian Zhang, Congying Xia, Yihao Feng, Caiming Xiong, Mingyuan Zhou

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, our method performs competitively on two distinct representative LM tasks discrete-prompt generation and text summarization. ... We evaluate our framework on two distinct representative LM tasks: generating discrete text prompts for few-shot text classification and text summarization. On both tasks, our method exhibits competitive performance.
Researcher Affiliation	Collaboration	1The University of Texas at Austin 2Salesforce Research
Pseudocode	Yes	Algorithm 1 A learning routine for the preference-grounded token-level reward function rϕ. ... Algorithm 2 An alternate-learning process for the reward function rϕ and the LM πθ.
Open Source Code	Yes	Source codes are released at https://github.com/Shentao-YANG/Preference_Grounded_Guidance.
Open Datasets	Yes	We test on three popular few-shot datasets in prior work [e.g., 77, 78]: two sentiment binary-classification datasets SST-2 [79, 80] and Yelp Polarity [81], and a topic four-way-classification dataset AG News [81, 82].
Dataset Splits	Yes	We also adopt the standard few-shot setting [76], where both the training and validation sets have 16 (o, y)-pairs per class.
Hardware Specification	Yes	The experiments are conducted on NVIDIA Ge Force RTX 3090 and NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions using the 'Hugging Face library [73]' but does not specify a version number for it or any other key software dependencies.
Experiment Setup	Yes	Additionally, we list the important hyperparameters for training our reward model in Table 13, and important hyperparameters for training our LM in Table 14.