reproducibilityindex.ai

Machine versus Human Attention in Deep Reinforcement Learning Tasks

Authors: Suna (Sihang) Guo, Ruohan Zhang, Bo Liu, Yifeng Zhu, Dana Ballard, Mary Hayhoe, Peter Stone

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we shed light on the inner workings of such trained models by analyzing the pixels that they attend to during task execution, and comparing them with the pixels attended to by humans executing the same tasks. ... Speciﬁcally, we compare the saliency maps of RL agents against visual attention models of human experts when learning to play Atari games. Further, we analyze how hyperparameters of the deep RL algorithm affect the learned representations and saliency maps of the trained agents.
Researcher Affiliation	Collaboration	Sihang Guo UT Austin sguo19@utexas.edu, Ruohan Zhang Stanford University zharu@stanford.edu, Bo Liu UT Austin bliu@cs.utexas.edu, Yifeng Zhu UT Austin yifeng.zhu@utexas.edu, Dana Ballard UT Austin danab@utexas.edu, Mary Hayhoe UT Austin hayhoe@utexas.edu, Peter Stone UT Austin, Sony AI pstone@cs.utexas.edu
Pseudocode	No	The paper describes the calculation of a saliency score using a formula (Equation 1) and explains the steps involved, but it does not present a formal pseudocode block or algorithm.
Open Source Code	Yes	Our human attention models, all compiled datasets, and tools for comparing RL attention with human attention are made available for future research in this direction.
Open Datasets	Yes	We use human expert gaze data from Atari-HEAD dataset [85]. ... [85] Ruohan Zhang, Calen Walshe, Zhuode Liu, Lin Guan, Karl S Muller, Jake A Whritner, Luxin Zhang, Mary M Hayhoe, and Dana H Ballard. Atari-head: Atari human eye-tracking and demonstration dataset. In Thirty-Fourth AAAI Conference on Artiﬁcial Intelligence. AAAI Press, 2020.
Dataset Splits	No	The paper states, 'We use 80% gaze data for training and 20% for testing,' but it does not explicitly mention a separate validation set or describe how such a set was used for hyperparameter tuning or model selection.
Hardware Specification	Yes	For the Atari gaming environment, we use the basic version that has no frame skipping and no stochasticity in action execution (No Frameskip-v4 version). In order to capture variance in training and ensure reproducibility, we select six popular Atari games instead of using all Atari games and train 540 agents in total, with the same hyperpareameters except for random seeds and discount factors (see Section 4; 300 GPU days on Ge Force GTX 1080/1080 Ti).
Software Dependencies	No	The paper mentions using 'Proximal Policy Optimization (PPO) [63]' and 'Stable baselines [30]', with reference [30] being a GitHub link. It also references 'Dopamine [11]' and 'Atari Model Zoo [70]'. However, it does not specify version numbers for these software components or other libraries like Python or PyTorch.
Experiment Setup	Yes	We use a popular deep RL algorithm named Proximal Policy Optimization (PPO) [63] with default hyperparameters [30]. ... We train PPO agents with γ {0.1, 0.3, 0.5, 0.7, 0.9, 0.9999} and generate saliency maps on the standard image set.