reproducibilityindex.ai

A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback

Authors: Robert Loftin, James MacGlashan, Bei Peng, Matthew Taylor, Michael Littman, Jeff Huang, David Roberts

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Results from user studies show that humans use a variety of training strategies in practice and both algorithms can learn a contextual bandit task faster than algorithms that treat the feedback as numeric. Simulated trainers are also employed to evaluate the algorithms in both contextual bandit and sequential decision-making tasks with similar results.
Researcher Affiliation	Academia	Robert Loftin North Carolina State University rtloftin@ncsu.edu James Mac Glashan Brown University james macglashan@brown.edu Bei Peng Washington State University bei.peng@wsu.edu Matthew E. Taylor Washington State University taylorm@eecs.wsu.edu Michael L. Littman Brown University mlittman@cs.brown.edu Jeff Huang Brown University jeff@cs.brown.edu David L. Roberts North Carolina State University robertsd@csc.ncsu.edu
Pseudocode	Yes	Algorithm 1 The SABL algorithm. ... Algorithm 2 The I-SABL algorithm.
Open Source Code	No	The paper does not provide any explicit statements about the release of source code or links to a code repository for the methodology described.
Open Datasets	No	The paper describes user studies and simulated trainer experiments but does not refer to any publicly available datasets with concrete access information (e.g., specific links, DOIs, or citations to external public datasets). The user study involved human participants interacting with a described task.
Dataset Splits	No	The paper mentions performance criteria (50%, 75%, 100% correctness) for the learning agents but does not specify any dataset splits (e.g., train/validation/test percentages or counts) for reproducibility.
Hardware Specification	No	The paper describes the experimental setup and tasks but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper describes the algorithms and their implementation conceptually but does not list any specific software dependencies with version numbers.
Experiment Setup	Yes	To evaluate their performance when learning from human trainers, we ran an online study in which participants trained learning agents using either SABL (with µ+ = µ = 0.1), I-SABL, M 0, or M+0... We tested each learning agent on tasks consisting of 2, 5, 10, 15 and 20 observations and 2, 3, or 4 actions. ... The trainer s error rate ϵ=0.2, matching SABL and I-SABL s assumed value. ... Trainer strategies were deﬁned by {µ+, µ } = {0.1, 0.1} for the balanced feedback strategy, {µ+, µ } = {0.1, 0.9} for the reward-focused strategy, and {µ+, µ } = {0.9, 0.1} for the punishment-focused strategy. ... For all strategies, ϵ = 0.05.