Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Shapley-Coop: Credit Assignment for Emergent Cooperation in Self-Interested LLM Agents
Authors: Yun Hua, Haosheng Chen, Shiqin Wang, Wenhao Li, Xiangfeng Wang, Jun Luo
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Shapley-Coop across two multi-agent games and a software engineering simulation, demonstrating that it consistently enhances LLM agent collaboration and facilitates equitable credit assignment. These results highlight the effectiveness of Shapley-Coop s pricing mechanisms in accurately reflecting individual contributions during task execution. |
| Researcher Affiliation | Academia | 1 Antai College of Economics and Management, Shanghai Jiao Tong University 2 School of Computer Science and Technology, East China Normal University 3 School of Computer Science and Technology, Tongji University 4 Key Laboratory of Mathematics and Engineering Applications (Mo E) 5 Shanghai Institute of AI for Education, East China Normal University 6 Shenzhen Loop Area Institute (SLAI) EMAIL, EMAIL EMAIL, EMAIL |
| Pseudocode | Yes | Short-Term Shapley Chain-of-Thought (Co T). During real-time task execution, precisely quantifying marginal contributions and achieving spontaneous collaboration and fair credit assignment is challenging due to uncertain future payoffs. To address this, the pricing mechanism in Shapley Coop is divided into two components. The Short-Term Shapley Chain-of-Thought (Co T) employs a qualitative, heuristic reasoning process to align the heterogeneous goals of self-interested LLM agents, enabling them to coordinate effectively within rational task timelines. The core objective of Short-Term Shapley Co T is to help agents reason whether their plans require assistance from others or provide benefits to them framed through the economic concept of externalities. A positive externality increases others utility, while a negative externality reduces it. Based on task rules and environmental conditions, agents assess the nature of these externalities and determine whether to offer or request compensation (price), thereby promoting efficient collaboration. Formally, consider a set of agents N = {1, . . . , n}. At time t, each agent i N is about to perform an action at i. The Short-Term Shapley Co T heuristic reasoning consists of the following three formally articulated steps: 1). Qualitative Assessment of Long-Term Rewards: Each agent i first qualitatively approximates the potential collective reward R(N) achievable by full cooperation of all agents, thus orienting themselves toward future cooperative gains. Formally, the agent uses an LLM heuristic estimation: R(N) LLM(st, {at j}j N), (4) where R(N) represents a qualitative, heuristic approximation of the total reward achievable by cooperative actions among all agents. Example Prompt: > "Given the current game state and planned actions of all agents, qualitatively estimate the overall cooperative payoff achievable by collective behaviors." 2). Evaluation of Critical Contributions: Next, each agent i qualitatively assesses whether its intended action at i creates a positive or negative externality for the remaining agents {N \ {i}}. Formally, the agent approximates the sign of marginal contribution, without explicit numerical calculation. Define the qualitative externality indicator Et i as follows: Et i = + if at i creates positive externalities for others (beneficial), if at i creates negative externalities for others (harmful). (5) The LLM agent uses a heuristic inference to estimate Et i: Et i LLM(st, at i, {at j}j =i). (6) Example Prompt > "Given my planned action and the current state, qualitatively assess whether my action creates a positive (beneficial) or negative (harmful) externality for other agents. Explain your reasoning." 3). Construction of Negotiation Strategy: Based on externality type, agents proactively propose qualitative price adjustments to align heterogeneous incentivize and achieve spontaneous collaboration: Negative externality (Et i = ): Propose price compensation to affected agents. Positive externality(Et i = +): Suggest receiving price from benefiting agents. Example Prompt > "Given my action creates a positive/negative externality, propose an appropriate redistribution of price to align heterogeneous incentivize and achieve spontaneous collaboration." The Short-Term Shapley Co T explicitly addresses the problem of whether pricing is necessary in the pricing mechanism, enabling agents align their heterogenous goals and receive spontaneously collaboration. Long-term Shapley Chain-of-Thought (Co T). Upon task completion, accurately quantifying each agent s actual contribution is crucial for maintaining long-term trust and incentive alignment. The Long-Term Shapley Co T explicitly addresses the credit assignment challenge within the pricing mechanism by retrospectively approximating Shapley values based on the observed task trajectory. Given a completed trajectory: τN = {s0, {a0 j}, {r0 j}, . . . , s T }, where T denotes the length of the trajectory. The Long-Term Shapley Co T involves the following explicit heuristic steps: 1). Collective Outcome Calculation: First of all, each agent i calculates the global utility R(N, τN) based on the given trajectory τN, through a simple calculation process so that each agent, where it is referred to as the first step in calculating the Shapley value shown in Equation. 1: Each agent computes the total collective reward (global utility) R(N, τN) achieved by the coalition τN over the entire trajectory. This calculation is explicitly defined as: j N rt j. (7) Given the explicit trajectory information, each agent calculates this quantity directly. Example Prompt > "Given the observed trajectory, the overall cooperative payoff is {R(N, τN)}(call external calculation function )". 2). Marginal Contribution Estimation: Then, each agent i estimates its own marginal contribution representing the incremental reward that agent i contributes to the group s total outcomes. Formally, the marginal contribution is defined as: i(C, τC) = R(C {i}, τC {i}) R(C, τC). (8) Example Prompt > "Given the observed trajectory and my actions, as I have known the collective outcome, my marginal contribution is { i(C, τC)(call function)}". 3). Apply Shapley Reasoning: Next, each agent i formally approximates their Shapley value based on the trajectory, by averaging their marginal contributions across all possible coalitions: C {1,...,N}\{i} |C|! (N |C| 1)! i(C, τC) . (9) Example Prompt > "Given the observed trajectory and my actions, as I have known the collective outcome and my marginal contribution, my Shapley Value is {ϕi(τN)(call function)}, and I need to {ask|pay} reward based on it". 4). Analyze and Negotiate Offers: Finally, agents negotiate among themselves based on their estimated Shapley values, ensuring fair credit assignment. Each agent proposes, accepts, rejects, or modifies redistribution offers, guided explicitly by their approximated Shapley values. An agent i proposes a pricing redistribution from the total utility. Example Prompt > "Given the completed trajectory and my estimated Shapley value, I need to access a pricing {r} from the total utility." Agents explicitly justify their negotiation stance using their own approximated Shapley values. Example Prompt > "I {agree|disagree|counter-propose} to your redistribution proposal because {reasoning}." The integration of Short-Term and Long-Term Shapley Chain-of-Thought establishes a comprehensive pricing mechanism that fosters spontaneous collaboration and ensures fair credit assignment among self-interested LLM agents in open-ended environments. This is achieved by aligning their heterogeneous goals and utilizing heuristic, LLM-guided Shapley methods to approximate each agent s actual contributions. |
| Open Source Code | Yes | The code is publicly available at https://github.com/hyyh28/Shapley Coop. |
| Open Datasets | No | The paper uses the Chat DEV environment [29] for software development simulation and describes custom-designed scenarios like |
| Dataset Splits | No | The paper describes task complexity levels and configurations for the experimental environments (e.g., Boss HP for Raid Battle, BMI Calculator and Art Canvas for Chat DEV tasks), but does not provide specific training/test/validation dataset splits. The environments are simulations rather than static datasets with predefined splits. |
| Hardware Specification | Yes | The Shapley-Coop module is executed on a virtual machine hosted on a small server with a 24-core CPU and 32 GB of DRAM. Since the implementation only involves API calls, GPU resources are not utilized. |
| Software Dependencies | No | The paper mentions using "Deep Seek-v3 as the underlying language model" and "API calls", but does not specify any other software dependencies with version numbers (e.g., Python libraries, frameworks). |
| Experiment Setup | Yes | To evaluate the Shapley-Coop workflow*, we design three experimental scenarios: 1) the Escape Room task, which demonstrates how existing negotiation workflows fail to resolve reward-allocation conflicts in social dilemmas; 2) the Raid Battle, a multi-step game where four heroes cooperate to defeat a boss, used to assess our workflow s performance in complex coordination settings; and 3) the Chat DEV task, a well-known environment where LLM agents act as project managers, software engineers, and testers to collaboratively develop software, showcasing Shapley-Coop s ability to effectively allocate value in real-world, multi-role contributions. Four configurations are compared to isolate the contribution of each component: i) LLM-only: No negotiation or cooperation; ii) LLM+NEG: Standard negotiation without Shapley reasoning; iii) LLM+STS: Short-term Shapley reasoning (Chain-of-Thought only); iv) LLM+SC: Full Shapley-Coop workflow. We provide a discussion comparing our choice of the Shapley value to alternative methods in Appendix F with an analysis of multi-agent reinforcement learning methods in Appendix E. For simplicity, we use only Deep Seek-v3 as the underlying language model in this setting. The Raid Battle scenario ... We design three levels of increasing difficulty: i) Level 1: Boss HP = 2000; ii) Level 2: Boss HP = 2500; iii) Level 3: Boss HP = 3000. In all levels, the heroes must defeat the Boss within 10 turns. We selected two representative tasks with varying complexity: (1) BMI Calculator: Develop an application calculating Body Mass Index from user inputs. (2) Art Canvas: Create a virtual painting studio app providing canvas, brushes, and color palettes. We measured contributions using weighted earned value (WEV), a widely-adopted project management metric [25], using four key artefacts already routinely tracked in software engineering tools: effective lines of code (Code), approved design/product decisions (Dec.), validated documents (Docs), and verified bug fixes (Fixes). The WEV of each role in task i is computed as: i {code,dec,doc,fix} where θr,i denotes agent r s contribution to artifact type i, and wi indicates standardized weights derived from a combination of benchmarks including COCOMO II [25], COCOMO [3], and CSBSG [44]. These weights are categorized as follows: wcode = 0.27 0.40, wdec = 0.15 0.35, wdoc = 0.05 0.15, wfix = 0.15 0.25. |