Where to Add Actions in Human-in-the-Loop Reinforcement Learning

Authors: Travis Mandel, Yun-En Liu, Emma Brunskill, Zoran Popovi_

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate ELI on a variety of simulated domains adapted from the literature, including domains with over a million actions and domains where the simulated experts change over time. We find ELI demonstrates excellent empirical performance, even in settings where the synthetic experts are quite poor.
Researcher Affiliation Collaboration 1Center for Game Science, Computer Science & Engineering, University of Washington, Seattle, WA 2Enlearn TM, Seattle, WA 3School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We examine performance on three environments adapted from the literature. Riverswim is a chain MDP with 6 states, 5 outcomes, and 2 ground actions per state that requires efficient exploration (Osband, Russo, and Van Roy 2013). Marblemaze (Asmuth et al. 2009; Russell and Norvig 1994) is a gridworld MDP with 36 states, 5 outcomes, and 4 ground actions per state. We use a modified version of the Large Action Task (Sallans and Hinton 2004)
Dataset Splits No The paper describes using simulated environments (Riverswim, Marblemaze, Large Action Task) and discusses running the agent within these environments, but it does not provide specific details on training, validation, or test dataset splits.
Hardware Specification No The paper states 'We show results using both simulated environments and simulated humans', but it does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run these simulations or experiments.
Software Dependencies No The paper mentions using Posterior Sampling Reinforcement Learning (PSRL) and other algorithmic approaches like UCRL2, MBIE, Thompson Sampling, and BOSS, but it does not provide specific software dependencies or their version numbers (e.g., programming language versions, library versions).
Experiment Setup Yes Each state starts with a single action... and every 20 episodes a new action is added at the state the agent selects. We transform standard domains to this new setting by sampling a new (or initial) action uniformly at random... We use Posterior Sampling Reinforcement Learning (PSRL)... We constrain all methods to only select states the agent has visited before. We set δ to 0.05... We set J = 10... For the prior we let αm = βm = 1 for all m...