Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents
Authors: Christian Rupprecht, Cyril Ibrahim, Christopher J. Pal
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments we show that this method can generate insights for a variety of environments and reinforcement learning methods. We explore results in the standard Atari benchmark games as well as in an autonomous driving simulator. In this section we thoroughly evaluate and analyze our method on Atari games (Bellemare et al., 2013) using the Open AI Gym (Brockman et al., 2016) and a driving simulator. We present qualitative results for three different reinforcement learning algorithms, show examples on how the method helps finding flaws in an agent, analyze the loss contributions and compare to previous techniques. |
| Researcher Affiliation | Collaboration | Christian Rupprecht1 Cyril Ibrahim2 Christopher J. Pal2,3 1Visual Geometry Group, University of Oxford 2Element AI 3Polytechnique Montr eal, Mila & Canada CIFAR AI Chair |
| Pseudocode | Yes | Algorithm 1 Optimize x for target T |
| Open Source Code | No | In the interest of reproducibility we will make the visualization code available. |
| Open Datasets | Yes | In this section we thoroughly evaluate and analyze our method on Atari games (Bellemare et al., 2013) using the Open AI Gym (Brockman et al., 2016) and a driving simulator. |
| Dataset Splits | No | The paper mentions evaluating on a 'validation set' in general terms and for comparison, but does not specify how the dataset splits (e.g., percentages, counts) were performed for their own experiments. |
| Hardware Specification | Yes | Training takes approximately four hours on a Titan Xp. |
| Software Dependencies | No | The paper mentions using Adam for optimization and refers to a public repository for ACKTR code, but it does not specify software names with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | In all our experiments we use the same factors to balance the loss terms in Equation 6: λ = 10 4 for the KL divergence and η = 10 3 for the agent perception loss. The generator is trained on 10, 000 frames (using the agent and an ϵ-greedy policy with ϵ = 0.1). Optimization is done with Adam (Kingma & Ba, 2015) with a learning rate of 10 3 and a batch size of 16 for 2000 epochs. |