reproducibilityindex.ai

Towards Interpretable Reinforcement Learning Using Attention Augmented Agents

Authors: Alexander Mott, Daniel Zoran, Mike Chrzanowski, Daan Wierstra, Danilo Jimenez Rezende

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We use the Arcade Learning Environment [33] to train and test our agent on 57 different Atari games. The model uses a 3 layer convolutional neural network followed by a convolutional LSTM as the vision core. We compare against two models without attentional bottlenecks to benchmark performance, both using the deeper residual network described in [34]. We find that our agent is competitive with these state-of-the-art baselines, see Table 1 for benchmark results and Appendix A.3 for learning curves and performance on individual levels.
Researcher Affiliation	Industry	Alex Mott* Daniel Zoran* Mike Chrzanowski Daan Wierstra Danilo J. Rezende Deep Mind London, UK {alexmott,danielzoran,chrzanowski,wierstra,danilor}@google.com
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	No	The paper states:
Open Datasets	Yes	We use the Arcade Learning Environment [33] to train and test our agent on 57 different Atari games.
Dataset Splits	No	The paper mentions training and testing but does not explicitly provide details about specific training, validation, and test splits (e.g., percentages or sample counts) for reproducibility, nor does it cite standard splits with precise identification.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies	No	The paper mentions
Experiment Setup	Yes	The model uses a 3 layer convolutional neural network followed by a convolutional LSTM as the vision core. Another (fully connected) LSTM generates a policy and a baseline function V as output; it takes as input the query and answer vectors, the previous reward and a one-hot encoding of the previous action. The query network is a three layer MLP, which takes as input the hidden state h of the LSTM from the previous time step and produces 4 attention queries. See Appendix A.1 for a full speciﬁcation of the network sizes. We use the Importance Weighted Actor-Learner Architecture [34] training architecture to train our agents. We use an actor-critic setup and a VTRACE loss with an RMSProp optimizer (see learning parameters in Appendix A.1 for more details).