reproducibilityindex.ai

Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning

Authors: Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro Ortega, Dj Strouse, Joel Z. Leibo, Nando De Freitas

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results demonstrate that inﬂuence leads to enhanced coordination and communication in challenging social dilemma environments, dramatically increasing the learning curves of the deep RL agents, and leading to more meaningful learned communication protocols. Through a series of three experiments, we show that the proposed social inﬂuence reward allows agents to learn to coordinate and communicate more effectively in these SSDs.
Researcher Affiliation	Collaboration	Media Lab, Massachusetts Institute of Technology, Cambridge, USA; Google Deep Mind, London, UK; Institute for Advanced Study, Princeton, Princeton, USA.
Pseudocode	No	The paper describes the methodology but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	Tthe code for these games is available in open-source.1 https://github.com/eugenevinitsky/sequential_social_dilemma_games - This link is for the sequential social dilemma games (environments), not for the authors' proposed social influence methodology.
Open Datasets	Yes	We experiment with two SSDs, a public goods game Cleanup, and a public pool resource game Harvest. Tthe code for these games is available in open-source.1 https://github.com/eugenevinitsky/sequential_social_dilemma_games
Dataset Splits	No	The paper does not explicitly specify train/validation/test dataset splits with percentages or sample counts for reproduction.
Hardware Specification	No	The paper does not specify the hardware used for experiments, such as specific GPU models, CPU models, or memory configurations.
Software Dependencies	No	The paper mentions software components like A3C and LSTM, but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	We use a curriculum learning approach which gradually increases the weight of the social inﬂuence reward over C steps (C [0.2 3.5] 108); this sometimes leads to a slight delay before the inﬂuence models performance improves. We measure the total collective reward obtained using the best hyperparameter setting tested with 5 random seeds each.