Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning

Authors: Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro Ortega, Dj Strouse, Joel Z. Leibo, Nando De Freitas

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate that influence leads to enhanced coordination and communication in challenging social dilemma environments, dramatically increasing the learning curves of the deep RL agents, and leading to more meaningful learned communication protocols. Through a series of three experiments, we show that the proposed social influence reward allows agents to learn to coordinate and communicate more effectively in these SSDs.
Researcher Affiliation Collaboration Media Lab, Massachusetts Institute of Technology, Cambridge, USA; Google Deep Mind, London, UK; Institute for Advanced Study, Princeton, Princeton, USA.
Pseudocode No The paper describes the methodology but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No Tthe code for these games is available in open-source.1 https://github.com/eugenevinitsky/sequential_social_dilemma_games - This link is for the sequential social dilemma games (environments), not for the authors' proposed social influence methodology.
Open Datasets Yes We experiment with two SSDs, a public goods game Cleanup, and a public pool resource game Harvest. Tthe code for these games is available in open-source.1 https://github.com/eugenevinitsky/sequential_social_dilemma_games
Dataset Splits No The paper does not explicitly specify train/validation/test dataset splits with percentages or sample counts for reproduction.
Hardware Specification No The paper does not specify the hardware used for experiments, such as specific GPU models, CPU models, or memory configurations.
Software Dependencies No The paper mentions software components like A3C and LSTM, but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes We use a curriculum learning approach which gradually increases the weight of the social influence reward over C steps (C [0.2 3.5] 108); this sometimes leads to a slight delay before the influence models performance improves. We measure the total collective reward obtained using the best hyperparameter setting tested with 5 random seeds each.