Investigating Human Priors for Playing Video Games

Authors: Rachit Dubey, Pulkit Agrawal, Deepak Pathak, Tom Griffiths, Alexei Efros

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Given a sample game, we conduct a series of ablation studies to quantify the importance of various priors on human performance. We do this by modifying the video game environment to systematically mask different types of visual information that could be used by humans as priors.
Researcher Affiliation Academia 1University of California, Berkeley. Correspondence to: Rachit Dubey <rach0012@berkeley.edu>.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Videos and the game manipulations are available at https://rach0012.github.io/human RL_website/.
Open Datasets No The paper describes creating a custom game environment and collecting data from human participants. It does not use a publicly available dataset for its primary human experiments, nor does it provide access information for the collected human play data.
Dataset Splits No The paper describes experiments with human participants and RL agents but does not specify dataset splits (training, validation, test) for machine learning model development. For human experiments, it states: 'Each participant was only allowed to complete a game once, and could not participate again (i.e. different 120 participants played each version of the game).'
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments or training the RL agent.
Software Dependencies No The paper mentions using 'A3C (Mnih et al., 2016)' and a 'curiosity-based RL algorithm specifically tailored to sparse-reward settings (Pathak et al., 2017)', but it does not specify any version numbers for these or other software libraries/dependencies.
Experiment Setup Yes We designed a browser-based platform game consisting of an agent sprite, platforms, ladders, angry pink object that kills the agent, spikes that are dangerous to jump on, a key, and a door (see Figure 2 (a)). The agent sprite can be moved with the help of arrow keys. A terminal reward of +1 is provided when the agent reaches the door after having to taken the key, thereby terminating the game. We quantified human performance on each version of the game by recruiting 120 participants from Amazon Mechanical Turk. Each participant was instructed to finish the game as quickly as possible using the arrow keys as controls, but no information about the goals or the reward structure of the game was communicated. Each participant was paid $1 for successfully completing the game. The maximum time allowed for playing the game was set to 30 minutes. For each game version, we report the mean performance of five random seeds that succeeded.