Eliciting User Preferences for Personalized Multi-Objective Decision Making through Comparative Feedback

Authors: Han Shao, Lee Cohen, Avrim Blum, Yishay Mansour, Aadirupa Saha, Matthew Walter

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we propose a multi-objective decision making framework that accommodates different user preferences over objectives, where preferences are learned via policy comparisons. Our model consists of a known Markov decision process with a vector-valued reward function, with each user having an unknown preference vector that expresses the relative importance of each objective. The goal is to efficiently compute a near-optimal policy for a given user. We consider two user feedback models. We first address the case where a user is provided with two policies and returns their preferred policy as feedback. We then move to a different user feedback model, where a user is instead provided with two small weighted sets of representative trajectories and selects the preferred one. In both cases, we suggest an algorithm that finds a nearly optimal policy for the user using a number of comparison queries that scales quasilinearly in the number of objectives.
Researcher Affiliation Collaboration Han Shao TTIC han@ttic.edu Lee Cohen TTIC leecohencs@gmail.com Avrim Blum TTIC avrim@ttic.edu Yishay Mansour Tel Aviv University and Google Research mansour.yishay@gmail.com Aadirupa Saha TTIC aadirupa@ttic.edu Matthew R. Walter TTIC mwalter@ttic.edu
Pseudocode Yes Algorithm 1 Identification of Basis Policies
Open Source Code No The paper does not provide any specific links to source code or statements about its availability.
Open Datasets No The paper is theoretical and does not describe specific datasets used for training or provide access information for any public datasets.
Dataset Splits No The paper is theoretical and does not describe experimental validation, thus no training/test/validation dataset splits are specified.
Hardware Specification No The paper is theoretical and does not mention any hardware specifications used for experiments.
Software Dependencies No The paper is theoretical and focuses on algorithms and proofs; it does not mention specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and describes algorithms and proofs; it does not detail an experimental setup with hyperparameters or training configurations.