Emergent Social Learning via Multi-agent Reinforcement Learning

Authors: Kamal K Ndousse, Douglas Eck, Sergey Levine, Natasha Jaques

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare the performance of agents with auxiliary predictive losses learning in an environment shared with experts (SOCIAL PPO + AUX PRED) to that of agents with the same architecture but trained alone (Solo PPO + aux pred).
Researcher Affiliation Collaboration 1Open AI, San Francisco, CA, USA 2Google Research Brain team, Mountain View, CA, USA 3UC Berkeley, Berkeley, CA, USA.
Pseudocode No The paper includes figures of network architectures and equations, but no explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes The code for both the environment and social learning agents is available at: https://github.com/kandouss/marlgrid.
Open Datasets Yes The environments used in this paper were originally based on Minigrid (Chevalier-Boisvert et al., 2018).
Dataset Splits No The paper discusses training procedures and the use of experience replay buffers, but does not specify validation dataset splits (e.g., percentages or counts) or reference predefined validation splits.
Hardware Specification Yes The experiments in this paper were performed primarily on a desktop computer with an AMD Ryzen 3950x CPU and two Nvidia GTX 1080TI GPUs, as well as g4dn.8xlarge instances provisioned on Amazon AWS.
Software Dependencies Yes We used Ubuntu 18.04 with python3.8 and all neural networks are implemented in Py Torch v1.6 (Paszke et al., 2019).
Experiment Setup Yes Each novice agents was trained with a learning rate of 1e 4. For Soci APL, the expert agents were trained with a learning rate of 1e 5. ... batch size 128 episodes mini-batches per batch 20 mini-batch num trajectories 512 mini-batch trajectory length 16 hidden state/advantage update interval 2 minibatches return discount γ 0.993 GAE-λ 0.97 PPO clip ratio 0.2 KL target 0.01 KL hard limit 0.03 ... The loss scaling coefficients used in our experiments are c V = 0.1, cent = 1e 5, and caux = 3.