Learning to Balance Altruism and Self-interest Based on Empathy in Mixed-Motive Games

Authors: Fanqi Kong, Yizhe Huang, Song-Chun Zhu, Siyuan Qi, Xue Feng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments are performed in spatially and temporally extended mixed-motive games, demonstrating LASE s ability to promote group collaboration without compromising fairness and its capacity to adapt policies to various types of interactive co-players. To verify the effectiveness of LASE, we theoretically analyze its dynamics of decision-making in iterated mixed-motive games and conduct comprehensive experiments in spatially and temporally extended mixed-motive games.
Researcher Affiliation Academia 1State Key Laboratory of General Artificial Intelligence, BIGAI 2Institute for Artificial Intelligence, Peking University 3Department of Automation, Tsinghua University
Pseudocode Yes LASE s pseudocode is given as Algorithm 1.
Open Source Code No Answer: [No] Justification: We are still sorting out the code for future open source.
Open Datasets Yes Iterated Prisoner s Dilemma (IPD). Here, we use iterated prisoner s dilemma (IPD) as an illustration to validate the theoretical analysis of LASE conducted in Section 4.3 and Appendix A. ... We employ the memory-1 IPD introduced in [6]... Here, we study four specific SSDs: Coingame, Cleanup, Sequential Stag-Hunt (SSH), and Sequential Snowdrift Game (SSG) (Fig. 3). Schelling diagrams (see Fig. 10) of the four environments validate that they are appropriate extensions of representative game paradigms (a detailed analysis is given in Appendix B).
Dataset Splits No The paper describes training and evaluation over episodes, such as "train for 30k episodes," but does not specify traditional train/test/validation dataset splits with percentages or fixed sample counts, as data is dynamically generated through environment interaction.
Hardware Specification Yes CPU: 128 Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz; Total memory: 263729336 k B GPU: 8 NVIDIA Ge Force RTX 3090; Memory per GPU: 24576 Mi B
Software Dependencies No The paper mentions using 'Adam optimizer' but does not specify version numbers for general software dependencies or libraries (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes Table 5: Hyperparameters (a) Hyperparameters in SSDs Parameter Value ϵstart 0.5 αθ 1e-4 ϵdiv 2e3 αµ 3e-5 ϵend 0.05 αϕ 3e-5 γsc 0.98 αη 5e-5 γ 0.98 update_freq 20 δ 0.1 batch_size 1000 (b) Hyperparameters in IPD Parameter Value ϵstart 0.5 αθ 5e-3 ϵdiv 1e3 αµ 1e-3 ϵend 0.01 αϕ 1e-3 γsc 0.98 αη 1e-3 γ 0.95 update_freq 20 δ 0.1 batch_size 64