Emergent Prosociality in Multi-Agent Games Through Gifting
Authors: Woodrow Z. Wang, Mark Beliaev, Erdem Bıyık, Daniel A. Lazar, Ramtin Pedarsani, Dorsa Sadigh
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | With gifting, we demonstrate increased convergence of high risk, general-sum coordination games to the prosocial equilibrium both via numerical analysis and experiments. |
| Researcher Affiliation | Academia | 1Stanford University 2University of California, Santa Barbara {wwang153, ebiyik, dorsa}@stanford.edu, {mbeliaev, dlazar, ramtin}@ucsb.edu |
| Pseudocode | No | The paper describes methods in text and equations but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide supplementary code for reproducibility of all the experiments. |
| Open Datasets | No | The paper utilizes game-theoretic environments (e.g., Stag Hunt, Bach or Stravinsky) which are defined by payoff matrices, not external publicly available datasets. Therefore, there is no dataset access information provided. |
| Dataset Splits | No | The paper describes training settings for reinforcement learning agents in simulated game environments but does not specify traditional train/validation/test dataset splits as would be typical for a supervised learning task. |
| Hardware Specification | Yes | The basin of attraction code ran on an Elastic Compute Cloud (EC2) instance in Amazon Web Services (AWS) with 16 v CPUs and 30 GB RAM. [...] The DQN training code ran on a personal computer with an 8C/16T processor and 32 GB RAM. |
| Software Dependencies | No | The paper mentions using a 'Deep Q-Network (DQN)' and 'Adam optimizer' but does not specify version numbers for any software libraries or frameworks (e.g., TensorFlow, PyTorch). |
| Experiment Setup | Yes | Unless otherwise stated, we set γ = 10. For all experiments, we train a Deep Q-Network (DQN) with independent ϵ-greedy exploration for each agent. We use Adam optimizer with a learning rate of 5 × 10−4. The replay buffer size is 105. The ϵ for exploration begins at 0.3 and exponentially decays to 0.01 over 2 × 104 steps. Each target network updates every 250 episodes. For the one-shot games, all agents are given a constant observation of 0. |