Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Incentive-Aware Dynamic Resource Allocation under Long-Term Cost Constraints
Authors: Yan Dai, Negin Golrezaei, Patrick Jaillet
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We begin by numerically evaluating primal-dual methods widely used in constrained online optimization and find them to be highly fragile in strategic settings agents can easily manipulate their reports to distort future dual updates for future gain. [...] We simulate a T-round game for 1000 trials, during which agents use Q-learning to optimize their reporting strategy. Under the vanilla primal-dual algorithm of Balseiro et al. (2023), agents learn to frequently misreport their values, resulting in reduced social welfare and low budget utilization (blue). In contrast, under our incentive-aware mechanism, agents gradually learn to report truthfully, leading to significantly improved social welfare while adhering to cost constraints (green). |
| Researcher Affiliation | Academia | Yan Dai Operations Research Center MIT EMAIL Negin Golrezaei Sloan School of Management MIT EMAIL Patrick Jaillet Department of EECS MIT EMAIL |
| Pseudocode | Yes | Protocol 1 Interaction Protocol for Repeated Resource Allocation [...] Algorithm 2 Primal-Dual Mechanism Robust to Strategic Manipulations [...] Algorithm 3 Dual Update Sub-Routine using Follow-the-Regularized-Leader (FTRL) [...] Algorithm 4 Dual Update Sub-Routine using Optimistic FTRL with Fixed Points (O-FTRL-FP) |
| Open Source Code | Yes | The numerical illustration setup is clearly explained in Section 5, with codes attached as supplementary materials. |
| Open Datasets | No | We simulate a game with T = 1000 rounds, K = 3 agents, a single resource dimension d = 1, and a discount factor γ = 0.9. Each agent s valuation is drawn from Vi = Unif[0, 1], and their cost is drawn from Ci = Unif[0.7ρ, 1.3ρ], for all i = 1, 2, 3. |
| Dataset Splits | No | We simulate a game with T = 1000 rounds, K = 3 agents, a single resource dimension d = 1, and a discount factor γ = 0.9. Each agent s valuation is drawn from Vi = Unif[0, 1], and their cost is drawn from Ci = Unif[0.7ρ, 1.3ρ], for all i = 1, 2, 3. [...] We repeat the same game for N = 1000 independent trails, where every agent keeps refining their strategy via Q-learning (Watkins and Dayan, 1992). |
| Hardware Specification | Yes | Every illustration executes on a M1 Mac Book Air 2020 in 10 minutes. |
| Software Dependencies | No | We repeat the same game for N = 1000 independent trails, where every agent keeps refining their strategy via Q-learning (Watkins and Dayan, 1992). In the n-th trial, every agent uses ϵn-greedy with a geometrically decaying schedule of ϵ = 0.995n. We update Q-tables using Q(s, a) Q(s, a) + α(r + γ(maxa Q(s , a )) Q(s, a)) where α = 0.1, r is the instantaneous reward, and s is the new state (t + 1, λt+1, vt+1,i). |
| Experiment Setup | Yes | We simulate a game with T = 1000 rounds, K = 3 agents, a single resource dimension d = 1, and a discount factor γ = 0.9. Each agent s valuation is drawn from Vi = Unif[0, 1], and their cost is drawn from Ci = Unif[0.7ρ, 1.3ρ], for all i = 1, 2, 3. [...] We discretize all the values, duals, and reports to the nearest multiple of 0.1. We repeat the same game for N = 1000 independent trails, where every agent keeps refining their strategy via Q-learning (Watkins and Dayan, 1992). In the n-th trial, every agent uses ϵn-greedy with a geometrically decaying schedule of ϵ = 0.995n. We update Q-tables using Q(s, a) Q(s, a) + α(r + γ(maxa Q(s , a )) Q(s, a)) where α = 0.1, r is the instantaneous reward, and s is the new state (t + 1, λt+1, vt+1,i). |