Optimal Design for Human Preference Elicitation

Authors: Subhojyoti Mukherjee, Anusha Lalitha, Kousha Kalantari, Aniket Anand Deshmukh, Ge Liu, Yifei Ma, Branislav Kveton

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we demonstrate that our algorithms are practical by evaluating them on existing question-answering problems. We compare our algorithms to multiple baselines in several experiments. We observe that the algorithms achieve a lower ranking loss than the baselines.
Researcher Affiliation Collaboration Subhojyoti Mukherjee University of Wisconsin-Madison smukherjee27@wisc.edu Anusha Lalitha AWS AI Labs Kousha Kalantari AWS AI Labs Aniket Deshmukh AWS AI Labs Ge Liu UIUC Yifei Ma AWS AI Labs Branislav Kveton Adobe Research
Pseudocode Yes Algorithm 1 Dope for absolute feedback. [...] Algorithm 2 Dope for ranking feedback.
Open Source Code No We did not get a permission to release the code.
Open Datasets Yes The Nectar dataset [101] is a dataset of 183k questions, each with 7 answers. [...] The Anthropic dataset [8] is a dataset of 161k questions with two answers per question.
Dataset Splits No The paper focuses on data elicitation and learning from collected feedback rather than traditional model training and evaluation on predefined train/validation/test splits. Therefore, it does not specify validation dataset splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory size) used for running the experiments. It only mentions computation time but not the specifications of the machines used.
Software Dependencies No The paper mentions the use of 'CVXPY [21]' but does not specify version numbers for this or any other software libraries, frameworks, or programming languages used in the experiments.
Experiment Setup Yes For each question-answer pair (i, k), the feature vector is xi,k = vec(qia i,k) and has length d = 36. The absolute feedback is generated as in (1). [...] We regularize both objectives with γ θ 2 2, for a small γ > 0.