Learning to Influence Human Behavior with Offline Reinforcement Learning

Authors: Joey Hong, Sergey Levine, Anca Dragan

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that offline RL can solve two challenges with effective influence. First, we show that by learning from a dataset of suboptimal human-human interaction on a variety of tasks none of which contains examples of successful influence an agent can learn influence strategies to steer humans towards better performance even on new tasks. Second, we show that by also modeling and conditioning on human behavior, offline RL can learn to affect not just the human s actions but also their underlying strategy, and adapt to changes in their strategy.
Researcher Affiliation Academia Joey Hong Sergey Levine Anca Dragan UC Berkeley {joey hong,sergey.levine,anca}@berkeley.edu
Pseudocode No The paper describes algorithmic approaches and modifications to CQL using text and mathematical equations, but it does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code No The paper does not contain any statement about releasing source code or provide a link to a code repository.
Open Datasets No We collected a dataset of human-human play where the human players were provided with one of several different instructions, in order to gather a diverse dataset that illustrates a variety of behaviors and human-human interactions. (No access information provided for this collected dataset).
Dataset Splits No The paper mentions data collection sizes (e.g., '20 human-human trajectories of length H = 1, 200', '30 trajectories of length H = 400') for evaluation, but it does not specify explicit train/validation/test dataset splits, percentages, or methods for partitioning the data.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models, memory specifications, or cloud/cluster resources used for running the experiments.
Software Dependencies No The paper refers to specific algorithms like CQL [18] and mentions neural networks, but it does not provide specific version numbers for any software components, libraries, or dependencies (e.g., Python, PyTorch, TensorFlow).
Experiment Setup No The paper states: 'We defer implementation details, i.e., architecture and hyperparameter choices, to Appendix A.' and 'We describe the high-level approach but defer implementation details to Appendix A.' Since Appendix A is not provided in the main text, specific experimental setup details are not present.