Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Preference-grounded Token-level Guidance for Language Model Fine-tuning

Authors: Shentao Yang, Shujian Zhang, Congying Xia, Yihao Feng, Caiming Xiong, Mingyuan Zhou

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, our method performs competitively on two distinct representative LM tasks discrete-prompt generation and text summarization. ... We evaluate our framework on two distinct representative LM tasks: generating discrete text prompts for few-shot text classification and text summarization. On both tasks, our method exhibits competitive performance.
Researcher Affiliation Collaboration 1The University of Texas at Austin 2Salesforce Research
Pseudocode Yes Algorithm 1 A learning routine for the preference-grounded token-level reward function rĪ•. ... Algorithm 2 An alternate-learning process for the reward function rĪ• and the LM Ī€Î¸.
Open Source Code Yes Source codes are released at https://github.com/Shentao-YANG/Preference_Grounded_Guidance.
Open Datasets Yes We test on three popular few-shot datasets in prior work [e.g., 77, 78]: two sentiment binary-classification datasets SST-2 [79, 80] and Yelp Polarity [81], and a topic four-way-classification dataset AG News [81, 82].
Dataset Splits Yes We also adopt the standard few-shot setting [76], where both the training and validation sets have 16 (o, y)-pairs per class.
Hardware Specification Yes The experiments are conducted on NVIDIA Ge Force RTX 3090 and NVIDIA A100 GPUs.
Software Dependencies No The paper mentions using the 'Hugging Face library [73]' but does not specify a version number for it or any other key software dependencies.
Experiment Setup Yes Additionally, we list the important hyperparameters for training our reward model in Table 13, and important hyperparameters for training our LM in Table 14.