Emergent Social Learning via Multi-agent Reinforcement Learning
Authors: Kamal K Ndousse, Douglas Eck, Sergey Levine, Natasha Jaques
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare the performance of agents with auxiliary predictive losses learning in an environment shared with experts (SOCIAL PPO + AUX PRED) to that of agents with the same architecture but trained alone (Solo PPO + aux pred). |
| Researcher Affiliation | Collaboration | 1Open AI, San Francisco, CA, USA 2Google Research Brain team, Mountain View, CA, USA 3UC Berkeley, Berkeley, CA, USA. |
| Pseudocode | No | The paper includes figures of network architectures and equations, but no explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | The code for both the environment and social learning agents is available at: https://github.com/kandouss/marlgrid. |
| Open Datasets | Yes | The environments used in this paper were originally based on Minigrid (Chevalier-Boisvert et al., 2018). |
| Dataset Splits | No | The paper discusses training procedures and the use of experience replay buffers, but does not specify validation dataset splits (e.g., percentages or counts) or reference predefined validation splits. |
| Hardware Specification | Yes | The experiments in this paper were performed primarily on a desktop computer with an AMD Ryzen 3950x CPU and two Nvidia GTX 1080TI GPUs, as well as g4dn.8xlarge instances provisioned on Amazon AWS. |
| Software Dependencies | Yes | We used Ubuntu 18.04 with python3.8 and all neural networks are implemented in Py Torch v1.6 (Paszke et al., 2019). |
| Experiment Setup | Yes | Each novice agents was trained with a learning rate of 1e 4. For Soci APL, the expert agents were trained with a learning rate of 1e 5. ... batch size 128 episodes mini-batches per batch 20 mini-batch num trajectories 512 mini-batch trajectory length 16 hidden state/advantage update interval 2 minibatches return discount γ 0.993 GAE-λ 0.97 PPO clip ratio 0.2 KL target 0.01 KL hard limit 0.03 ... The loss scaling coefficients used in our experiments are c V = 0.1, cent = 1e 5, and caux = 3. |