Emergent Prosociality in Multi-Agent Games Through Gifting

Authors: Woodrow Z. Wang, Mark Beliaev, Erdem Bıyık, Daniel A. Lazar, Ramtin Pedarsani, Dorsa Sadigh

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental With gifting, we demonstrate increased convergence of high risk, general-sum coordination games to the prosocial equilibrium both via numerical analysis and experiments.
Researcher Affiliation Academia 1Stanford University 2University of California, Santa Barbara {wwang153, ebiyik, dorsa}@stanford.edu, {mbeliaev, dlazar, ramtin}@ucsb.edu
Pseudocode No The paper describes methods in text and equations but does not include any pseudocode or algorithm blocks.
Open Source Code Yes We provide supplementary code for reproducibility of all the experiments.
Open Datasets No The paper utilizes game-theoretic environments (e.g., Stag Hunt, Bach or Stravinsky) which are defined by payoff matrices, not external publicly available datasets. Therefore, there is no dataset access information provided.
Dataset Splits No The paper describes training settings for reinforcement learning agents in simulated game environments but does not specify traditional train/validation/test dataset splits as would be typical for a supervised learning task.
Hardware Specification Yes The basin of attraction code ran on an Elastic Compute Cloud (EC2) instance in Amazon Web Services (AWS) with 16 v CPUs and 30 GB RAM. [...] The DQN training code ran on a personal computer with an 8C/16T processor and 32 GB RAM.
Software Dependencies No The paper mentions using a 'Deep Q-Network (DQN)' and 'Adam optimizer' but does not specify version numbers for any software libraries or frameworks (e.g., TensorFlow, PyTorch).
Experiment Setup Yes Unless otherwise stated, we set γ = 10. For all experiments, we train a Deep Q-Network (DQN) with independent ϵ-greedy exploration for each agent. We use Adam optimizer with a learning rate of 5 × 10−4. The replay buffer size is 105. The ϵ for exploration begins at 0.3 and exponentially decays to 0.01 over 2 × 104 steps. Each target network updates every 250 episodes. For the one-shot games, all agents are given a constant observation of 0.