reproducibilityindex.ai

Eliciting User Preferences for Personalized Multi-Objective Decision Making through Comparative Feedback

Authors: Han Shao, Lee Cohen, Avrim Blum, Yishay Mansour, Aadirupa Saha, Matthew Walter

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this work, we propose a multi-objective decision making framework that accommodates different user preferences over objectives, where preferences are learned via policy comparisons. Our model consists of a known Markov decision process with a vector-valued reward function, with each user having an unknown preference vector that expresses the relative importance of each objective. The goal is to efficiently compute a near-optimal policy for a given user. We consider two user feedback models. We first address the case where a user is provided with two policies and returns their preferred policy as feedback. We then move to a different user feedback model, where a user is instead provided with two small weighted sets of representative trajectories and selects the preferred one. In both cases, we suggest an algorithm that finds a nearly optimal policy for the user using a number of comparison queries that scales quasilinearly in the number of objectives.
Researcher Affiliation	Collaboration	Han Shao TTIC han@ttic.edu Lee Cohen TTIC leecohencs@gmail.com Avrim Blum TTIC avrim@ttic.edu Yishay Mansour Tel Aviv University and Google Research mansour.yishay@gmail.com Aadirupa Saha TTIC aadirupa@ttic.edu Matthew R. Walter TTIC mwalter@ttic.edu
Pseudocode	Yes	Algorithm 1 Identification of Basis Policies
Open Source Code	No	The paper does not provide any specific links to source code or statements about its availability.
Open Datasets	No	The paper is theoretical and does not describe specific datasets used for training or provide access information for any public datasets.
Dataset Splits	No	The paper is theoretical and does not describe experimental validation, thus no training/test/validation dataset splits are specified.
Hardware Specification	No	The paper is theoretical and does not mention any hardware specifications used for experiments.
Software Dependencies	No	The paper is theoretical and focuses on algorithms and proofs; it does not mention specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and describes algorithms and proofs; it does not detail an experimental setup with hyperparameters or training configurations.