Machine versus Human Attention in Deep Reinforcement Learning Tasks

Authors: Suna (Sihang) Guo, Ruohan Zhang, Bo Liu, Yifeng Zhu, Dana Ballard, Mary Hayhoe, Peter Stone

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we shed light on the inner workings of such trained models by analyzing the pixels that they attend to during task execution, and comparing them with the pixels attended to by humans executing the same tasks. ... Specifically, we compare the saliency maps of RL agents against visual attention models of human experts when learning to play Atari games. Further, we analyze how hyperparameters of the deep RL algorithm affect the learned representations and saliency maps of the trained agents.
Researcher Affiliation Collaboration Sihang Guo UT Austin sguo19@utexas.edu, Ruohan Zhang Stanford University zharu@stanford.edu, Bo Liu UT Austin bliu@cs.utexas.edu, Yifeng Zhu UT Austin yifeng.zhu@utexas.edu, Dana Ballard UT Austin danab@utexas.edu, Mary Hayhoe UT Austin hayhoe@utexas.edu, Peter Stone UT Austin, Sony AI pstone@cs.utexas.edu
Pseudocode No The paper describes the calculation of a saliency score using a formula (Equation 1) and explains the steps involved, but it does not present a formal pseudocode block or algorithm.
Open Source Code Yes Our human attention models, all compiled datasets, and tools for comparing RL attention with human attention are made available for future research in this direction.
Open Datasets Yes We use human expert gaze data from Atari-HEAD dataset [85]. ... [85] Ruohan Zhang, Calen Walshe, Zhuode Liu, Lin Guan, Karl S Muller, Jake A Whritner, Luxin Zhang, Mary M Hayhoe, and Dana H Ballard. Atari-head: Atari human eye-tracking and demonstration dataset. In Thirty-Fourth AAAI Conference on Artificial Intelligence. AAAI Press, 2020.
Dataset Splits No The paper states, 'We use 80% gaze data for training and 20% for testing,' but it does not explicitly mention a separate validation set or describe how such a set was used for hyperparameter tuning or model selection.
Hardware Specification Yes For the Atari gaming environment, we use the basic version that has no frame skipping and no stochasticity in action execution (No Frameskip-v4 version). In order to capture variance in training and ensure reproducibility, we select six popular Atari games instead of using all Atari games and train 540 agents in total, with the same hyperpareameters except for random seeds and discount factors (see Section 4; 300 GPU days on Ge Force GTX 1080/1080 Ti).
Software Dependencies No The paper mentions using 'Proximal Policy Optimization (PPO) [63]' and 'Stable baselines [30]', with reference [30] being a GitHub link. It also references 'Dopamine [11]' and 'Atari Model Zoo [70]'. However, it does not specify version numbers for these software components or other libraries like Python or PyTorch.
Experiment Setup Yes We use a popular deep RL algorithm named Proximal Policy Optimization (PPO) [63] with default hyperparameters [30]. ... We train PPO agents with γ {0.1, 0.3, 0.5, 0.7, 0.9, 0.9999} and generate saliency maps on the standard image set.