Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Visualizing and Understanding Atari Agents
Authors: Samuel Greydanus, Anurag Koul, Jonathan Dodge, Alan Fern
ICML 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we take a step toward explaining deep RL agents through a case study using Atari 2600 environments. ... We also test our method on non-expert human subjects and find that it improves their ability to reason about these agents. Overall, our results show that saliency information can provide significant insight into an RL agent s decisions and learning behavior. ... 4. Experiments |
| Researcher Affiliation | Academia | 1Oregon State University, Corvallis, Oregon, USA. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We make our code and results available online1. 1github.com/greydanus/visualize atari |
| Open Datasets | Yes | We trained agents on Pong, Breakout, and Space Invaders using the Open AI Gym API (Brockman et al., 2016; Bellemare et al., 2013). |
| Dataset Splits | No | The paper does not provide specific validation dataset split information (exact percentages, sample counts, or detailed splitting methodology). |
| Hardware Specification | No | The paper mentions '20 CPU processes' but does not specify exact CPU models, types, or any GPU details used for running experiments. |
| Software Dependencies | No | The paper mentions software like 'Open AI Gym API' and 'A3C RL algorithm' and 'Adam optimizer' but does not provide specific version numbers for these or other dependencies. |
| Experiment Setup | Yes | All of our Atari agents have the same recurrent architecture. The input at each time step is a preprocessed version of the current frame. Preprocessing consisted of gray-scaling, down-sampling by a factor of 2, cropping the game space to an 80 80 square and normalizing the values to [0, 1]. This input is processed by 4 convolutional layers (each with 32 filters, kernel sizes of 3, strides of 2, and paddings of 1), followed by an LSTM layer with 256 hidden units and a fully-connected layer with n + 1 units... We used the A3C RL algorithm (Mnih et al., 2016) with a learning rate of α = 10 4, a discount factor of γ = 0.99, and computed loss on the policy using Generalized Advantage Estimation with λ = 1.0 (Schulman et al., 2016). Each policy was trained asynchronously for a total of 40 million frames... |