Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces

Authors: Garrett Warnell, Nicholas Waytowich, Vernon Lawhern, Peter Stone

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally evaluated Deep TAMER in the context of the Atari game of BOWLING... In short, we found that, using Deep TAMER, human trainers were able to train successful BOWLING agents in just 15 minutes. Moreover, we found that agents trained using Deep TAMER outperformed agents trained using state-of-the art deep reinforcement learning techniques as well as agents trained using the original TAMER method proposed in (Knox and Stone 2009).
Researcher Affiliation Academia 1U.S. Army Research Laboratory, 2Columbia University, New York, 3The University of Texas at Austin
Pseudocode Yes Algorithm 1 The Deep TAMER algorithm.
Open Source Code No The paper mentions using implementations of D-DQN and A3C from Open AI (Hesse et al. 2017; Open AI 2017), but does not state that the code for Deep TAMER itself is open-source or provide a link.
Open Datasets Yes For our experiments, we used the implementation provided by the Aracade Learning Environment (Bellemare et al. 2013) included as part of the Open AI Gym suite (Brockman et al. 2016).
Dataset Splits No The paper describes the data collection process (human training sessions for a fixed duration) and mentions 'training states' for autoencoder pre-training, but it does not specify explicit train/validation/test dataset splits for the main experiments or a cross-validation strategy.
Hardware Specification No The paper mentions that training was performed using 'an experimental computer' and that the method 'enables its success with standard computing hardware (e.g., a consumer laptop)', but it does not provide specific details on GPU models, CPU types, or other hardware specifications.
Software Dependencies No The paper mentions using 'Open AI Gym suite' and 'implementations of D-DQN and A3C that have been made available from Open AI', but it does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes For the Atari BOWLING environment, we let s represent the two most-recent 160 160 game images... encoder structure that produces a p = 100-dimensional output. For z in ˆH(s, a) = z(f(s), a), we use a two-layer, fully-connected neural network with 16 hidden units per layer... We acquire the training states for each environment in offline simulation using a random policy... we use the continuous uniform distribution over the interval [0.2, 4]... we perform SGD updates (3) at a fixed rate that typically exceeds the rate at which humans are typically able to provide feedback... we select this parameter such that it results in buffer updates every 10 time steps... In practice, we perform mini-batch updates using the average gradient computed over several (x, y) samples instead of just one.