Eliciting User Preferences for Personalized Multi-Objective Decision Making through Comparative Feedback
Authors: Han Shao, Lee Cohen, Avrim Blum, Yishay Mansour, Aadirupa Saha, Matthew Walter
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we propose a multi-objective decision making framework that accommodates different user preferences over objectives, where preferences are learned via policy comparisons. Our model consists of a known Markov decision process with a vector-valued reward function, with each user having an unknown preference vector that expresses the relative importance of each objective. The goal is to efficiently compute a near-optimal policy for a given user. We consider two user feedback models. We first address the case where a user is provided with two policies and returns their preferred policy as feedback. We then move to a different user feedback model, where a user is instead provided with two small weighted sets of representative trajectories and selects the preferred one. In both cases, we suggest an algorithm that finds a nearly optimal policy for the user using a number of comparison queries that scales quasilinearly in the number of objectives. |
| Researcher Affiliation | Collaboration | Han Shao TTIC han@ttic.edu Lee Cohen TTIC leecohencs@gmail.com Avrim Blum TTIC avrim@ttic.edu Yishay Mansour Tel Aviv University and Google Research mansour.yishay@gmail.com Aadirupa Saha TTIC aadirupa@ttic.edu Matthew R. Walter TTIC mwalter@ttic.edu |
| Pseudocode | Yes | Algorithm 1 Identification of Basis Policies |
| Open Source Code | No | The paper does not provide any specific links to source code or statements about its availability. |
| Open Datasets | No | The paper is theoretical and does not describe specific datasets used for training or provide access information for any public datasets. |
| Dataset Splits | No | The paper is theoretical and does not describe experimental validation, thus no training/test/validation dataset splits are specified. |
| Hardware Specification | No | The paper is theoretical and does not mention any hardware specifications used for experiments. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithms and proofs; it does not mention specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and describes algorithms and proofs; it does not detail an experimental setup with hyperparameters or training configurations. |