Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents

Authors: Christian Rupprecht, Cyril Ibrahim, Christopher J. Pal

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments we show that this method can generate insights for a variety of environments and reinforcement learning methods. We explore results in the standard Atari benchmark games as well as in an autonomous driving simulator. In this section we thoroughly evaluate and analyze our method on Atari games (Bellemare et al., 2013) using the Open AI Gym (Brockman et al., 2016) and a driving simulator. We present qualitative results for three different reinforcement learning algorithms, show examples on how the method helps finding flaws in an agent, analyze the loss contributions and compare to previous techniques.
Researcher Affiliation Collaboration Christian Rupprecht1 Cyril Ibrahim2 Christopher J. Pal2,3 1Visual Geometry Group, University of Oxford 2Element AI 3Polytechnique Montr eal, Mila & Canada CIFAR AI Chair
Pseudocode Yes Algorithm 1 Optimize x for target T
Open Source Code No In the interest of reproducibility we will make the visualization code available.
Open Datasets Yes In this section we thoroughly evaluate and analyze our method on Atari games (Bellemare et al., 2013) using the Open AI Gym (Brockman et al., 2016) and a driving simulator.
Dataset Splits No The paper mentions evaluating on a 'validation set' in general terms and for comparison, but does not specify how the dataset splits (e.g., percentages, counts) were performed for their own experiments.
Hardware Specification Yes Training takes approximately four hours on a Titan Xp.
Software Dependencies No The paper mentions using Adam for optimization and refers to a public repository for ACKTR code, but it does not specify software names with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes In all our experiments we use the same factors to balance the loss terms in Equation 6: λ = 10 4 for the KL divergence and η = 10 3 for the agent perception loss. The generator is trained on 10, 000 frames (using the agent and an ϵ-greedy policy with ϵ = 0.1). Optimization is done with Adam (Kingma & Ba, 2015) with a learning rate of 10 3 and a batch size of 16 for 2000 epochs.