reproducibilityindex.ai

Learning to Play General-Sum Games against Multiple Boundedly Rational Agents

Authors: Eric Zhao, Alexander R. Trott, Caiming Xiong, Stephan Zheng

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate our framework learns robust mechanisms in both matrix games and complex spatiotemporal games. In particular, we learn a dynamic tax policy that improves the welfare of a simulated trade-and-barter economy by 15%, even when facing previously unseen boundedly rational RL taxpayers.
Researcher Affiliation	Collaboration	1Salesforce Research. Palo Alto, California, USA 2University of California, Berkeley. Berkeley, California, USA 3Mosaic ML. San Francisco, California, USA
Pseudocode	Yes	Algorithm 1: Decoupled sampling of pessimistic equilibria.
Open Source Code	Yes	Source code for these experiments are released at https:// github.com/salesforce/strategically-robust-ai.
Open Datasets	No	The paper describes using simulated game environments ('Sequential Bimatrix Game', 'AI Economist') rather than publicly available datasets with concrete access information (URL, DOI, or repository).
Dataset Splits	No	The paper mentions selecting top 10 seeds 'in a validation environment' but does not provide specific details on how this validation set is created or split from the overall data/simulation for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, or any other computer specifications used for running experiments.
Software Dependencies	No	The paper mentions using a 'common multi-agent implementation of the PPO algorithm' but does not specify any software libraries or dependencies with version numbers.
Experiment Setup	Yes	Output: Approximate lower-bound on L(ϵ) (Eq 5). Input: Number of training steps Mtr and self-play steps Ms, reward slack ϵ, multiplier learning rate αλ, uncoupled self-play algorithm B, regret estimators Ri : P(A) R for each agent i. Initialize mixed strategy x1. for j = 1, . . . , Mtr do for i = 1, . . . , N do Estimate regret ri as ˆri Ri(xj), where ri := max xi P (Ai) ui( xi, x i) ui(x). Compute multiplier λi λi αλ (ˆri ϵ). end for Using B, run Ms rounds of self-play with utilities ˆui(a) := (λiui(a) u0(a))/(1 + λi). Set xj+1 as the resulting empirical play distribution. end for Return 1 Mtr PMtr t=1 u0(xt).