Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Incentive-Aware Dynamic Resource Allocation under Long-Term Cost Constraints

Authors: Yan Dai, Negin Golrezaei, Patrick Jaillet

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We begin by numerically evaluating primal-dual methods widely used in constrained online optimization and find them to be highly fragile in strategic settings agents can easily manipulate their reports to distort future dual updates for future gain. [...] We simulate a T-round game for 1000 trials, during which agents use Q-learning to optimize their reporting strategy. Under the vanilla primal-dual algorithm of Balseiro et al. (2023), agents learn to frequently misreport their values, resulting in reduced social welfare and low budget utilization (blue). In contrast, under our incentive-aware mechanism, agents gradually learn to report truthfully, leading to significantly improved social welfare while adhering to cost constraints (green).
Researcher Affiliation Academia Yan Dai Operations Research Center MIT EMAIL Negin Golrezaei Sloan School of Management MIT EMAIL Patrick Jaillet Department of EECS MIT EMAIL
Pseudocode Yes Protocol 1 Interaction Protocol for Repeated Resource Allocation [...] Algorithm 2 Primal-Dual Mechanism Robust to Strategic Manipulations [...] Algorithm 3 Dual Update Sub-Routine using Follow-the-Regularized-Leader (FTRL) [...] Algorithm 4 Dual Update Sub-Routine using Optimistic FTRL with Fixed Points (O-FTRL-FP)
Open Source Code Yes The numerical illustration setup is clearly explained in Section 5, with codes attached as supplementary materials.
Open Datasets No We simulate a game with T = 1000 rounds, K = 3 agents, a single resource dimension d = 1, and a discount factor γ = 0.9. Each agent s valuation is drawn from Vi = Unif[0, 1], and their cost is drawn from Ci = Unif[0.7ρ, 1.3ρ], for all i = 1, 2, 3.
Dataset Splits No We simulate a game with T = 1000 rounds, K = 3 agents, a single resource dimension d = 1, and a discount factor γ = 0.9. Each agent s valuation is drawn from Vi = Unif[0, 1], and their cost is drawn from Ci = Unif[0.7ρ, 1.3ρ], for all i = 1, 2, 3. [...] We repeat the same game for N = 1000 independent trails, where every agent keeps refining their strategy via Q-learning (Watkins and Dayan, 1992).
Hardware Specification Yes Every illustration executes on a M1 Mac Book Air 2020 in 10 minutes.
Software Dependencies No We repeat the same game for N = 1000 independent trails, where every agent keeps refining their strategy via Q-learning (Watkins and Dayan, 1992). In the n-th trial, every agent uses ϵn-greedy with a geometrically decaying schedule of ϵ = 0.995n. We update Q-tables using Q(s, a) Q(s, a) + α(r + γ(maxa Q(s , a )) Q(s, a)) where α = 0.1, r is the instantaneous reward, and s is the new state (t + 1, λt+1, vt+1,i).
Experiment Setup Yes We simulate a game with T = 1000 rounds, K = 3 agents, a single resource dimension d = 1, and a discount factor γ = 0.9. Each agent s valuation is drawn from Vi = Unif[0, 1], and their cost is drawn from Ci = Unif[0.7ρ, 1.3ρ], for all i = 1, 2, 3. [...] We discretize all the values, duals, and reports to the nearest multiple of 0.1. We repeat the same game for N = 1000 independent trails, where every agent keeps refining their strategy via Q-learning (Watkins and Dayan, 1992). In the n-th trial, every agent uses ϵn-greedy with a geometrically decaying schedule of ϵ = 0.995n. We update Q-tables using Q(s, a) Q(s, a) + α(r + γ(maxa Q(s , a )) Q(s, a)) where α = 0.1, r is the instantaneous reward, and s is the new state (t + 1, λt+1, vt+1,i).