Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning
Authors: Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro Ortega, Dj Strouse, Joel Z. Leibo, Nando De Freitas
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate that influence leads to enhanced coordination and communication in challenging social dilemma environments, dramatically increasing the learning curves of the deep RL agents, and leading to more meaningful learned communication protocols. Through a series of three experiments, we show that the proposed social influence reward allows agents to learn to coordinate and communicate more effectively in these SSDs. |
| Researcher Affiliation | Collaboration | Media Lab, Massachusetts Institute of Technology, Cambridge, USA; Google Deep Mind, London, UK; Institute for Advanced Study, Princeton, Princeton, USA. |
| Pseudocode | No | The paper describes the methodology but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | Tthe code for these games is available in open-source.1 https://github.com/eugenevinitsky/sequential_social_dilemma_games - This link is for the sequential social dilemma games (environments), not for the authors' proposed social influence methodology. |
| Open Datasets | Yes | We experiment with two SSDs, a public goods game Cleanup, and a public pool resource game Harvest. Tthe code for these games is available in open-source.1 https://github.com/eugenevinitsky/sequential_social_dilemma_games |
| Dataset Splits | No | The paper does not explicitly specify train/validation/test dataset splits with percentages or sample counts for reproduction. |
| Hardware Specification | No | The paper does not specify the hardware used for experiments, such as specific GPU models, CPU models, or memory configurations. |
| Software Dependencies | No | The paper mentions software components like A3C and LSTM, but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | We use a curriculum learning approach which gradually increases the weight of the social influence reward over C steps (C [0.2 3.5] 108); this sometimes leads to a slight delay before the influence models performance improves. We measure the total collective reward obtained using the best hyperparameter setting tested with 5 random seeds each. |