reproducibilityindex.ai

Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents

Authors: Christian Rupprecht, Cyril Ibrahim, Christopher J. Pal

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments we show that this method can generate insights for a variety of environments and reinforcement learning methods. We explore results in the standard Atari benchmark games as well as in an autonomous driving simulator. In this section we thoroughly evaluate and analyze our method on Atari games (Bellemare et al., 2013) using the Open AI Gym (Brockman et al., 2016) and a driving simulator. We present qualitative results for three different reinforcement learning algorithms, show examples on how the method helps ﬁnding ﬂaws in an agent, analyze the loss contributions and compare to previous techniques.
Researcher Affiliation	Collaboration	Christian Rupprecht1 Cyril Ibrahim2 Christopher J. Pal2,3 1Visual Geometry Group, University of Oxford 2Element AI 3Polytechnique Montr eal, Mila & Canada CIFAR AI Chair
Pseudocode	Yes	Algorithm 1 Optimize x for target T
Open Source Code	No	In the interest of reproducibility we will make the visualization code available.
Open Datasets	Yes	In this section we thoroughly evaluate and analyze our method on Atari games (Bellemare et al., 2013) using the Open AI Gym (Brockman et al., 2016) and a driving simulator.
Dataset Splits	No	The paper mentions evaluating on a 'validation set' in general terms and for comparison, but does not specify how the dataset splits (e.g., percentages, counts) were performed for their own experiments.
Hardware Specification	Yes	Training takes approximately four hours on a Titan Xp.
Software Dependencies	No	The paper mentions using Adam for optimization and refers to a public repository for ACKTR code, but it does not specify software names with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	In all our experiments we use the same factors to balance the loss terms in Equation 6: λ = 10 4 for the KL divergence and η = 10 3 for the agent perception loss. The generator is trained on 10, 000 frames (using the agent and an ϵ-greedy policy with ϵ = 0.1). Optimization is done with Adam (Kingma & Ba, 2015) with a learning rate of 10 3 and a batch size of 16 for 2000 epochs.