Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Aligning Individual and Collective Objectives in Multi-Agent Cooperation
Authors: Yang Li, Wenhao Zhang, Jianhong Wang, Shao Zhang, Yali Du, Ying Wen, Wei Pan
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the effectiveness of our algorithm Ag A through benchmark environments for testing mixed-motive collaboration with small-scale agents such as the two-player public good game and the sequential social dilemma games, Cleanup and Harvest, as well as our self-developed large-scale environment in the game Star Craft II. |
| Researcher Affiliation | Academia | Yang Li The University of Manchester EMAIL Wenhao Zhang Shanghai Jiao Tong University EMAIL Jianhong Wang INFORMED-AI Hub University of Bristol EMAIL Shao Zhang Shanghai Jiao Tong University EMAIL Yali Du King s College London EMAIL Ying Wen Shanghai Jiao Tong University EMAIL Wei Pan The University of Manchester EMAIL |
| Pseudocode | Yes | Algorithm 1 Altruistic Gradient Adjustment (Ag A) |
| Open Source Code | No | We provide the details to reproduce the main experimental results in Appendix E and the main code in supplemental material (we will release them when the paper is accepted). |
| Open Datasets | Yes | In addition to commonly used testbeds like the public goods matrix game and sequential social dilemma games (Cleanup and Harvest) [Leibo et al., 2017]... we introduce a more complex mixed-motive environment called Selfish MMM2, an adaptation of the MMM2 map from the Star Craft II game [Samvelyan et al., 2019]. |
| Dataset Splits | No | The paper describes the use of simulation environments (Cleanup, Harvest, Selfish-MMM2) for training and evaluation. However, it does not specify explicit training, validation, and test *dataset splits* with percentages or sample counts for these environments, as they are typically run for a number of steps/episodes rather than being static datasets that are split. |
| Hardware Specification | Yes | Most experiments were conducted on a node with a Tesla V100 GPU (32GB memory) and 40 CPU cores. The hyper-parameters for PPO training are as follows. Most experiments were conducted on a node with two NVIDIA Ge Force RTX 3090 GPUs and 32 CPU cores. |
| Software Dependencies | No | The paper mentions using 'PPO algorithm in stable-baselines3' and 'Adam optimizer', along with 'IPPO' and 'MAPPO' algorithms, but does not specify exact version numbers for these software packages or libraries. |
| Experiment Setup | Yes | The hyper-parameters for PPO training are as follows. The learning rate is 1e-4 The PPO clipping factor is 0.2. The value loss coefficient is 1. The entropy coefficient is 0.001. The γ is 0.99. The total environment step is 1e7 for Harvest and 2e7 for Cleanup. The environment episode length is 1000. The grad clip is 40. The hyper-parameters for PPO-based training are as follows. The learning rate is 5e-4 The PPO clipping factor is 0.2. The value loss coefficient is 1. The entropy coefficient is 0.01. The γ is 0.99. The total environment step is 1e7. The factor β in reward function is 1. The environment episode length is 400. |