Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning

Authors: Mahmoud Assran, Joshua Romoff, Nicolas Ballas, Joelle Pineau, Michael Rabbat

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare our approach on six Atari games [Machado et al., 2018] following Stooke and Abbeel [2018] with vanilla A2C, A3C and the IMPALA off-policy method [Dhariwal et al., 2017, Mnih et al., 2016, Espeholt et al., 2018]. Our main empirical findings are:
Researcher Affiliation Collaboration Mahmoud Assran Facebook AI Research & Department of Electrical and Computer Engineering Mc Gill University
Pseudocode Yes Pseudocode is provided in Algorithm 1
Open Source Code Yes Our implementation of GALA-A2C is publicly available at https://github.com/facebookresearch/gala.
Open Datasets Yes We evaluate GALA for training Deep RL agents on Atari-2600 games [Machado et al., 2018].
Dataset Splits No The paper mentions training across '10 random seeds' and using '10 evaluation episodes' for final policy evaluation, but does not provide specific details on how validation splits or procedures were applied for hyperparameter tuning or model selection in the main text.
Hardware Specification Yes Figure 3: Comparing GALA-A2C hardware utilization to that of A2C when using one NVIDIA V100 GPU and 48 Intel CPUs.
Software Dependencies No The paper mentions that 'All methods are implemented in Py Torch [Paszke et al., 2017]' and uses 'Torch Beast [Küttler et al., 2019]' for a baseline, but does not provide specific version numbers for PyTorch or other key software libraries used in their implementation.
Experiment Setup Yes We use Adam optimizer [Kingma and Ba, 2014] with learning rate 7 10^4. The discount factor is set to 0.99 and the entropy regularization to 0.01.