Improving Context-Aware Preference Modeling for Language Models

Authors: Silviu Pitis, Ziang Xiao, Nicolas Le Roux, Alessandro Sordoni

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we conduct experiments to benchmark the context-specific performance of various models and investigate the potential value of context-aware preference modeling.
Researcher Affiliation Collaboration Silviu Pitisa,b Ziang Xiaoc Nicolas Le Rouxb,d Alessandro Sordonib,d a University of Toronto b Microsoft Research c Johns Hopkins University d MILA
Pseudocode No The following template is used for llm-as-a-judge models (Llama 3 and GPT-4 Turbo). Llama-3 uses the logit_template (see Appendix D.1 for how the score is computed). GPT-4 Turbo uses the argmax_score_template_no_cot and runs inference with temperature = 0.
Open Source Code No Unfortunately we unable to release code, but are happy to clarify any details over email.
Open Datasets Yes We open-source high quality context-conditioned preference datasets that disentangle contextspecific preference from general preference, which we use for finetuning and evaluation. The datasets can be found at https://huggingface.co/datasets/microsoft/rpr.
Dataset Splits No We divide the dataset into a training set of 10,167 paired samples, and a test set of 1,000 paired samples, with no overlap between train and test prompts.
Hardware Specification Yes Finetuning took approximately 8 hours on a single A100. Experiments were run on an internal cluster of GPUs with between 24GB and 48GB VRAM each.
Software Dependencies No optim = adamw_hf lr_scheduler_type = linear PEFT_CONFIG = Lora Config( task_type=Task Type.SEQ_CLS, inference_mode=False, r=16, lora_alpha=32, lora_dropout=0.05, target_modules=[ 'q_proj', 'k_proj', 'v_proj', 'dense' ], )
Experiment Setup Yes epochs = 1 per_device_train_batch_size = 2 gradient_accumulation_steps = 1 learning_rate = 1e-5 weight_decay = 1e-2 optim = adamw_hf lr_scheduler_type = linear