Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Fair Cooperation in Mixed-Motive Games via Conflict-Aware Gradient Adjustment

Authors: Woojun Kim, Katia Sycara

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results in sequential social dilemma environments demonstrate that our approach outperforms baselines in terms of social welfare, while maintaining fairness. We conduct our experiments using the JAX-based codebase and environments provided by the Social JAX suite [7]. We modify the existing environments Coins, Cleanup, and Harvest to incorporate a fairness perspective.
Researcher Affiliation Academia Woojun Kim Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 EMAIL Katia Sycara Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 EMAIL
Pseudocode Yes Algorithm 1: FCGrad Input: Policy parameters θ, learning rate η, weighting factor β
Open Source Code Yes The official implementation of FCGrad is available at: https://github.com/wjkim1202/fcgrad.
Open Datasets Yes We conduct our experiments using the JAX-based codebase and environments provided by the Social JAX suite [7]. We modify the existing environments Coins, Cleanup, and Harvest to incorporate a fairness perspective. Specifically, since Cleanup and Harvest already involve inherent fairness dilemmas, we introduce only minor changes by assigning distinct respawn positions to the agents. For the Coin Game, which originally focuses on the conflict between individual and collective objectives, we introduce asymmetry in the potential rewards that agents can obtain, creating a disparity in individual incentives. Fig.2 illustrates the considered environments. We provide detailed descriptions in the following sections. [7] Zihao Guo, Richard Willis, Shuqing Shi, Tristan Tomilin, Joel Z Leibo, and Yali Du. Socialjax: An evaluation suite for multi-agent reinforcement learning in sequential social dilemmas. ar Xiv preprint ar Xiv:2503.14576, 2025.
Dataset Splits No The paper describes modifications to the environments (e.g., unfair coin probabilities, fixed spawn positions for agents) and the number of agents, but does not specify explicit training/validation/test splits of a dataset in the traditional sense. The environments are simulation-based, not static datasets with splits.
Hardware Specification Yes All experiments were run on a local server equipped with an AMD EPYC 7713 64-Core CPU and five NVIDIA RTX 6000 Ada Generation GPUs.
Software Dependencies No The paper mentions using a 'JAX-based codebase' and that all methods are 'implemented on top of the IPPO [3]', and uses 'Adam optimizer' and 'PPO policy gradient'. However, it does not provide specific version numbers for JAX or other libraries used in the implementation.
Experiment Setup Yes We train the networks using the Adam optimizer with a learning rate of 1e-4, linearly annealed over time. PPO is used with a clipping threshold of 0.2 and two update epochs per iteration, using 500 minibatches. We collect trajectories from 256 parallel environments, each running for 1000 steps per rollout. The discount factor is set to γ = 0.99 and the GAE parameter to λ = 0.95. The entropy and value loss coefficients are set to 0.1, respectively. Gradients are clipped to a maximum global norm of 0.5. (Section C.1 Unfair Coin)