Reinforcement Nash Equilibrium Solver

Authors: Xinrun Wang, Chang Yang, Shuxin Li, Pengdeng Li, Xiao Huang, Hau Chan, Bo An

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on large-scale normal-form games show that our method can further improve the approximation of NE of different solvers, i.e., α-rank, CE, FP and PRD, and can be generalized to unseen games.
Researcher Affiliation Collaboration Xinrun Wang1 , Chang Yang2 , Shuxin Li1 , Pengdeng Li1 , Xiao Huang2 , Hau Chan3 and Bo An1,4 1Nanyang Technological University, Singapore 2The Hong Kong Polytechnic University, Hong Kong SAR, China 3University of Nebraska-Lincoln, Lincoln, Nebraska, United States 4Skywork AI, Singapore
Pseudocode No The paper describes the RENES procedure and the training with PPO in textual form but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper mentions using implementations from Open Spiel [Lanctot et al., 2019] for α-rank and PRD, and regret matching in Open Spiel for CE. However, it does not provide any explicit statement or link for the source code of RENES or its experimental setup.
Open Datasets No For the games, we randomly sample 3000 games for training, and 500 games for testing to verify the ability of Renes to generalize to unseen games. (Section 5.1). The paper describes randomly sampling games, implying a custom dataset, but does not provide access information for this dataset.
Dataset Splits No For the games, we randomly sample 3000 games for training, and 500 games for testing to verify the ability of Renes to generalize to unseen games. The paper specifies training and testing sets, but does not explicitly mention a distinct "validation" set or provide precise split percentages for all three, or specify cross-validation.
Hardware Specification No The paper does not provide any specific hardware details (e.g., GPU models, CPU models, memory specifications) used for running its experiments.
Software Dependencies No The paper mentions various software components and libraries like Open Spiel, PPO, GNN, GCN, MLP, and Canonical Polyadic (CP) decomposition (with a reference to TensorLy's implementation), but it does not specify version numbers for any of these components.
Experiment Setup Yes For the training, we set the decomposition rank r = 10, i.e., the number of the action dimensions is 10, and T = 50, i.e., the number of the maximum steps of the modification is 50.