reproducibilityindex.ai

Optimal Design for Human Preference Elicitation

Authors: Subhojyoti Mukherjee, Anusha Lalitha, Kousha Kalantari, Aniket Anand Deshmukh, Ge Liu, Yifei Ma, Branislav Kveton

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we demonstrate that our algorithms are practical by evaluating them on existing question-answering problems. We compare our algorithms to multiple baselines in several experiments. We observe that the algorithms achieve a lower ranking loss than the baselines.
Researcher Affiliation	Collaboration	Subhojyoti Mukherjee University of Wisconsin-Madison smukherjee27@wisc.edu Anusha Lalitha AWS AI Labs Kousha Kalantari AWS AI Labs Aniket Deshmukh AWS AI Labs Ge Liu UIUC Yifei Ma AWS AI Labs Branislav Kveton Adobe Research
Pseudocode	Yes	Algorithm 1 Dope for absolute feedback. [...] Algorithm 2 Dope for ranking feedback.
Open Source Code	No	We did not get a permission to release the code.
Open Datasets	Yes	The Nectar dataset [101] is a dataset of 183k questions, each with 7 answers. [...] The Anthropic dataset [8] is a dataset of 161k questions with two answers per question.
Dataset Splits	No	The paper focuses on data elicitation and learning from collected feedback rather than traditional model training and evaluation on predefined train/validation/test splits. Therefore, it does not specify validation dataset splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory size) used for running the experiments. It only mentions computation time but not the specifications of the machines used.
Software Dependencies	No	The paper mentions the use of 'CVXPY [21]' but does not specify version numbers for this or any other software libraries, frameworks, or programming languages used in the experiments.
Experiment Setup	Yes	For each question-answer pair (i, k), the feature vector is xi,k = vec(qia i,k) and has length d = 36. The absolute feedback is generated as in (1). [...] We regularize both objectives with γ θ 2 2, for a small γ > 0.