Visualizing and Understanding Atari Agents
Authors: Samuel Greydanus, Anurag Koul, Jonathan Dodge, Alan Fern
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we take a step toward explaining deep RL agents through a case study using Atari 2600 environments. ... We also test our method on non-expert human subjects and find that it improves their ability to reason about these agents. Overall, our results show that saliency information can provide significant insight into an RL agent s decisions and learning behavior. ... 4. Experiments |
| Researcher Affiliation | Academia | 1Oregon State University, Corvallis, Oregon, USA. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We make our code and results available online1. 1github.com/greydanus/visualize atari |
| Open Datasets | Yes | We trained agents on Pong, Breakout, and Space Invaders using the Open AI Gym API (Brockman et al., 2016; Bellemare et al., 2013). |
| Dataset Splits | No | The paper does not provide specific validation dataset split information (exact percentages, sample counts, or detailed splitting methodology). |
| Hardware Specification | No | The paper mentions '20 CPU processes' but does not specify exact CPU models, types, or any GPU details used for running experiments. |
| Software Dependencies | No | The paper mentions software like 'Open AI Gym API' and 'A3C RL algorithm' and 'Adam optimizer' but does not provide specific version numbers for these or other dependencies. |
| Experiment Setup | Yes | All of our Atari agents have the same recurrent architecture. The input at each time step is a preprocessed version of the current frame. Preprocessing consisted of gray-scaling, down-sampling by a factor of 2, cropping the game space to an 80 80 square and normalizing the values to [0, 1]. This input is processed by 4 convolutional layers (each with 32 filters, kernel sizes of 3, strides of 2, and paddings of 1), followed by an LSTM layer with 256 hidden units and a fully-connected layer with n + 1 units... We used the A3C RL algorithm (Mnih et al., 2016) with a learning rate of α = 10 4, a discount factor of γ = 0.99, and computed loss on the policy using Generalized Advantage Estimation with λ = 1.0 (Schulman et al., 2016). Each policy was trained asynchronously for a total of 40 million frames... |