reproducibilityindex.ai

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Authors: Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Vlad Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, Koray Kavukcuoglu

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the Deep Mind Lab environment (Beattie et al., 2016)) and Atari57 (all available Atari games in Arcade Learning Environment (Bellemare et al., 2013a)). Our results show that IMPALA is able to achieve better performance than previous agents with less data, and crucially exhibits positive transfer between tasks as a result of its multi-task approach.
Researcher Affiliation	Industry	1Deep Mind Technologies, London, United Kingdom.
Pseudocode	No	The paper describes the V-trace actor-critic algorithm in prose within Section 4.2 but does not provide a formal pseudocode block or algorithm figure.
Open Source Code	No	The paper does not explicitly state that the source code for the IMPALA methodology is openly available or provide a link to it. It only links to the DMLab environment itself (github.com/deepmind/lab).
Open Datasets	Yes	We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the Deep Mind Lab environment (Beattie et al., 2016)) and Atari57 (all available Atari games in Arcade Learning Environment (Bellemare et al., 2013a)). A detailed description of DMLab-30 and the tasks are available at github.com/deepmind/lab and deepmind.com/dm-lab-30.
Dataset Splits	No	We perform hyperparameter sweeps over the weighting of entropy regularisation, the learning rate and the RMSProp epsilon. For each experiment we use an identical set of 24 pre-sampled hyperparameter combinations from the ranges in Appendix D.1. While the paper describes hyperparameter tuning, it does not provide specific train/validation/test dataset splits as percentages or sample counts for the data within the environments.
Hardware Specification	Yes	1 Nvidia P100 (Footnote to Table 1)
Software Dependencies	No	Finally, we also make use of several off the shelf optimisations available in Tensor Flow (Abadi et al., 2017) such as preparing the next batch of data for the learner while still performing computation, compiling parts of the computational graph with XLA (a Tensor Flow Just-In-Time compiler) and optimising the data format to get the maximum performance from the cu DNN framework (Chetlur et al., 2014). The paper mentions these software components but does not specify version numbers.
Experiment Setup	Yes	We perform hyperparameter sweeps over the weighting of entropy regularisation, the learning rate and the RMSProp epsilon. For each experiment we use an identical set of 24 pre-sampled hyperparameter combinations from the ranges in Appendix D.1. The other hyperparameters were ﬁxed to values speciﬁed in Appendix D.3.