reproducibilityindex.ai

Emergent Prosociality in Multi-Agent Games Through Gifting

Authors: Woodrow Z. Wang, Mark Beliaev, Erdem Bıyık, Daniel A. Lazar, Ramtin Pedarsani, Dorsa Sadigh

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	With gifting, we demonstrate increased convergence of high risk, general-sum coordination games to the prosocial equilibrium both via numerical analysis and experiments.
Researcher Affiliation	Academia	1Stanford University 2University of California, Santa Barbara {wwang153, ebiyik, dorsa}@stanford.edu, {mbeliaev, dlazar, ramtin}@ucsb.edu
Pseudocode	No	The paper describes methods in text and equations but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	We provide supplementary code for reproducibility of all the experiments.
Open Datasets	No	The paper utilizes game-theoretic environments (e.g., Stag Hunt, Bach or Stravinsky) which are defined by payoff matrices, not external publicly available datasets. Therefore, there is no dataset access information provided.
Dataset Splits	No	The paper describes training settings for reinforcement learning agents in simulated game environments but does not specify traditional train/validation/test dataset splits as would be typical for a supervised learning task.
Hardware Specification	Yes	The basin of attraction code ran on an Elastic Compute Cloud (EC2) instance in Amazon Web Services (AWS) with 16 v CPUs and 30 GB RAM. [...] The DQN training code ran on a personal computer with an 8C/16T processor and 32 GB RAM.
Software Dependencies	No	The paper mentions using a 'Deep Q-Network (DQN)' and 'Adam optimizer' but does not specify version numbers for any software libraries or frameworks (e.g., TensorFlow, PyTorch).
Experiment Setup	Yes	Unless otherwise stated, we set γ = 10. For all experiments, we train a Deep Q-Network (DQN) with independent ϵ-greedy exploration for each agent. We use Adam optimizer with a learning rate of 5 × 10−4. The replay buffer size is 105. The ϵ for exploration begins at 0.3 and exponentially decays to 0.01 over 2 × 104 steps. Each target network updates every 250 episodes. For the one-shot games, all agents are given a constant observation of 0.