Learning to Balance Altruism and Self-interest Based on Empathy in Mixed-Motive Games
Authors: Fanqi Kong, Yizhe Huang, Song-Chun Zhu, Siyuan Qi, Xue Feng
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments are performed in spatially and temporally extended mixed-motive games, demonstrating LASE s ability to promote group collaboration without compromising fairness and its capacity to adapt policies to various types of interactive co-players. To verify the effectiveness of LASE, we theoretically analyze its dynamics of decision-making in iterated mixed-motive games and conduct comprehensive experiments in spatially and temporally extended mixed-motive games. |
| Researcher Affiliation | Academia | 1State Key Laboratory of General Artificial Intelligence, BIGAI 2Institute for Artificial Intelligence, Peking University 3Department of Automation, Tsinghua University |
| Pseudocode | Yes | LASE s pseudocode is given as Algorithm 1. |
| Open Source Code | No | Answer: [No] Justification: We are still sorting out the code for future open source. |
| Open Datasets | Yes | Iterated Prisoner s Dilemma (IPD). Here, we use iterated prisoner s dilemma (IPD) as an illustration to validate the theoretical analysis of LASE conducted in Section 4.3 and Appendix A. ... We employ the memory-1 IPD introduced in [6]... Here, we study four specific SSDs: Coingame, Cleanup, Sequential Stag-Hunt (SSH), and Sequential Snowdrift Game (SSG) (Fig. 3). Schelling diagrams (see Fig. 10) of the four environments validate that they are appropriate extensions of representative game paradigms (a detailed analysis is given in Appendix B). |
| Dataset Splits | No | The paper describes training and evaluation over episodes, such as "train for 30k episodes," but does not specify traditional train/test/validation dataset splits with percentages or fixed sample counts, as data is dynamically generated through environment interaction. |
| Hardware Specification | Yes | CPU: 128 Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz; Total memory: 263729336 k B GPU: 8 NVIDIA Ge Force RTX 3090; Memory per GPU: 24576 Mi B |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' but does not specify version numbers for general software dependencies or libraries (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | Table 5: Hyperparameters (a) Hyperparameters in SSDs Parameter Value ϵstart 0.5 αθ 1e-4 ϵdiv 2e3 αµ 3e-5 ϵend 0.05 αϕ 3e-5 γsc 0.98 αη 5e-5 γ 0.98 update_freq 20 δ 0.1 batch_size 1000 (b) Hyperparameters in IPD Parameter Value ϵstart 0.5 αθ 5e-3 ϵdiv 1e3 αµ 1e-3 ϵend 0.01 αϕ 1e-3 γsc 0.98 αη 1e-3 γ 0.95 update_freq 20 δ 0.1 batch_size 64 |