Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning to Balance Altruism and Self-interest Based on Empathy in Mixed-Motive Games

Authors: Fanqi Kong, Yizhe Huang, Song-Chun Zhu, Siyuan Qi, Xue Feng

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments are performed in spatially and temporally extended mixed-motive games, demonstrating LASE s ability to promote group collaboration without compromising fairness and its capacity to adapt policies to various types of interactive co-players. To verify the effectiveness of LASE, we theoretically analyze its dynamics of decision-making in iterated mixed-motive games and conduct comprehensive experiments in spatially and temporally extended mixed-motive games.
Researcher Affiliation	Academia	1State Key Laboratory of General Artificial Intelligence, BIGAI 2Institute for Artificial Intelligence, Peking University 3Department of Automation, Tsinghua University
Pseudocode	Yes	LASE s pseudocode is given as Algorithm 1.
Open Source Code	No	Answer: [No] Justification: We are still sorting out the code for future open source.
Open Datasets	Yes	Iterated Prisoner s Dilemma (IPD). Here, we use iterated prisoner s dilemma (IPD) as an illustration to validate the theoretical analysis of LASE conducted in Section 4.3 and Appendix A. ... We employ the memory-1 IPD introduced in [6]... Here, we study four specific SSDs: Coingame, Cleanup, Sequential Stag-Hunt (SSH), and Sequential Snowdrift Game (SSG) (Fig. 3). Schelling diagrams (see Fig. 10) of the four environments validate that they are appropriate extensions of representative game paradigms (a detailed analysis is given in Appendix B).
Dataset Splits	No	The paper describes training and evaluation over episodes, such as "train for 30k episodes," but does not specify traditional train/test/validation dataset splits with percentages or fixed sample counts, as data is dynamically generated through environment interaction.
Hardware Specification	Yes	CPU: 128 Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz; Total memory: 263729336 k B GPU: 8 NVIDIA Ge Force RTX 3090; Memory per GPU: 24576 Mi B
Software Dependencies	No	The paper mentions using 'Adam optimizer' but does not specify version numbers for general software dependencies or libraries (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	Table 5: Hyperparameters (a) Hyperparameters in SSDs Parameter Value ϵstart 0.5 αθ 1e-4 ϵdiv 2e3 αµ 3e-5 ϵend 0.05 αϕ 3e-5 γsc 0.98 αη 5e-5 γ 0.98 update_freq 20 δ 0.1 batch_size 1000 (b) Hyperparameters in IPD Parameter Value ϵstart 0.5 αθ 5e-3 ϵdiv 1e3 αµ 1e-3 ϵend 0.01 αϕ 1e-3 γsc 0.98 αη 1e-3 γ 0.95 update_freq 20 δ 0.1 batch_size 64