reproducibilityindex.ai

Where to Add Actions in Human-in-the-Loop Reinforcement Learning

Authors: Travis Mandel, Yun-En Liu, Emma Brunskill, Zoran Popovi_

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate ELI on a variety of simulated domains adapted from the literature, including domains with over a million actions and domains where the simulated experts change over time. We ﬁnd ELI demonstrates excellent empirical performance, even in settings where the synthetic experts are quite poor.
Researcher Affiliation	Collaboration	1Center for Game Science, Computer Science & Engineering, University of Washington, Seattle, WA 2Enlearn TM, Seattle, WA 3School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We examine performance on three environments adapted from the literature. Riverswim is a chain MDP with 6 states, 5 outcomes, and 2 ground actions per state that requires efﬁcient exploration (Osband, Russo, and Van Roy 2013). Marblemaze (Asmuth et al. 2009; Russell and Norvig 1994) is a gridworld MDP with 36 states, 5 outcomes, and 4 ground actions per state. We use a modiﬁed version of the Large Action Task (Sallans and Hinton 2004)
Dataset Splits	No	The paper describes using simulated environments (Riverswim, Marblemaze, Large Action Task) and discusses running the agent within these environments, but it does not provide specific details on training, validation, or test dataset splits.
Hardware Specification	No	The paper states 'We show results using both simulated environments and simulated humans', but it does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run these simulations or experiments.
Software Dependencies	No	The paper mentions using Posterior Sampling Reinforcement Learning (PSRL) and other algorithmic approaches like UCRL2, MBIE, Thompson Sampling, and BOSS, but it does not provide specific software dependencies or their version numbers (e.g., programming language versions, library versions).
Experiment Setup	Yes	Each state starts with a single action... and every 20 episodes a new action is added at the state the agent selects. We transform standard domains to this new setting by sampling a new (or initial) action uniformly at random... We use Posterior Sampling Reinforcement Learning (PSRL)... We constrain all methods to only select states the agent has visited before. We set δ to 0.05... We set J = 10... For the prior we let αm = βm = 1 for all m...