Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning
Authors: Mahmoud Assran, Joshua Romoff, Nicolas Ballas, Joelle Pineau, Michael Rabbat
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare our approach on six Atari games [Machado et al., 2018] following Stooke and Abbeel [2018] with vanilla A2C, A3C and the IMPALA off-policy method [Dhariwal et al., 2017, Mnih et al., 2016, Espeholt et al., 2018]. Our main empirical findings are: |
| Researcher Affiliation | Collaboration | Mahmoud Assran Facebook AI Research & Department of Electrical and Computer Engineering Mc Gill University |
| Pseudocode | Yes | Pseudocode is provided in Algorithm 1 |
| Open Source Code | Yes | Our implementation of GALA-A2C is publicly available at https://github.com/facebookresearch/gala. |
| Open Datasets | Yes | We evaluate GALA for training Deep RL agents on Atari-2600 games [Machado et al., 2018]. |
| Dataset Splits | No | The paper mentions training across '10 random seeds' and using '10 evaluation episodes' for final policy evaluation, but does not provide specific details on how validation splits or procedures were applied for hyperparameter tuning or model selection in the main text. |
| Hardware Specification | Yes | Figure 3: Comparing GALA-A2C hardware utilization to that of A2C when using one NVIDIA V100 GPU and 48 Intel CPUs. |
| Software Dependencies | No | The paper mentions that 'All methods are implemented in Py Torch [Paszke et al., 2017]' and uses 'Torch Beast [Küttler et al., 2019]' for a baseline, but does not provide specific version numbers for PyTorch or other key software libraries used in their implementation. |
| Experiment Setup | Yes | We use Adam optimizer [Kingma and Ba, 2014] with learning rate 7 10^4. The discount factor is set to 0.99 and the entropy regularization to 0.01. |