reproducibilityindex.ai

Emergent Social Learning via Multi-agent Reinforcement Learning

Authors: Kamal K Ndousse, Douglas Eck, Sergey Levine, Natasha Jaques

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare the performance of agents with auxiliary predictive losses learning in an environment shared with experts (SOCIAL PPO + AUX PRED) to that of agents with the same architecture but trained alone (Solo PPO + aux pred).
Researcher Affiliation	Collaboration	1Open AI, San Francisco, CA, USA 2Google Research Brain team, Mountain View, CA, USA 3UC Berkeley, Berkeley, CA, USA.
Pseudocode	No	The paper includes figures of network architectures and equations, but no explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	The code for both the environment and social learning agents is available at: https://github.com/kandouss/marlgrid.
Open Datasets	Yes	The environments used in this paper were originally based on Minigrid (Chevalier-Boisvert et al., 2018).
Dataset Splits	No	The paper discusses training procedures and the use of experience replay buffers, but does not specify validation dataset splits (e.g., percentages or counts) or reference predefined validation splits.
Hardware Specification	Yes	The experiments in this paper were performed primarily on a desktop computer with an AMD Ryzen 3950x CPU and two Nvidia GTX 1080TI GPUs, as well as g4dn.8xlarge instances provisioned on Amazon AWS.
Software Dependencies	Yes	We used Ubuntu 18.04 with python3.8 and all neural networks are implemented in Py Torch v1.6 (Paszke et al., 2019).
Experiment Setup	Yes	Each novice agents was trained with a learning rate of 1e 4. For Soci APL, the expert agents were trained with a learning rate of 1e 5. ... batch size 128 episodes mini-batches per batch 20 mini-batch num trajectories 512 mini-batch trajectory length 16 hidden state/advantage update interval 2 minibatches return discount γ 0.993 GAE-λ 0.97 PPO clip ratio 0.2 KL target 0.01 KL hard limit 0.03 ... The loss scaling coefﬁcients used in our experiments are c V = 0.1, cent = 1e 5, and caux = 3.