Safe and Robust Subgame Exploitation in Imperfect Information Games

Authors: Zhenxing Ge, Zheng Xu, Tianyu Ding, Linjian Meng, Bo An, Wenbin Li, Yang Gao

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations in popular poker games demonstrate OX-Search s superiority in both exploitability and exploitation compared to previous methods. 5. Experiment For a thorough assessment of OX-Search, our evaluation employs three key metrics: (I) safety in the face of the worst-case opponent; (II) effectiveness against evolving opponent strategies; and (III) robust exploitation in scenarios involving modeling errors.
Researcher Affiliation Collaboration 1State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu, China. 2School of Computer Science and Engineering, Nanyang Technological University, Singapore. 3Microsoft Corporation, Redmond, Washington, USA. 4Skywork AI, Singapore. 5School of Intelligence Science and Technology, Nanjing University, Suzhou Campus, Suzhou, Jiangsu, China.
Pseudocode No The paper describes the construction of a 'gadget game' in a step-by-step manner but does not provide formal pseudocode or an algorithm block.
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes Extensive experiments are conducted on Leduc Hold em (Southey et al., 2005; Wu et al., 2021) and Flop Hold em Poker (FHP) (Brown et al., 2018; Liu et al., 2022) to evaluate the performance of OX-Search.
Dataset Splits No The paper does not provide explicit details about training, validation, or test dataset splits (e.g., percentages or sample counts). It describes the generation of blueprint strategies and evaluation against different opponent types.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU models, CPU types, memory) used for conducting the experiments.
Software Dependencies No The paper mentions techniques and algorithms like 'Monte Carlo CFR' and 'abstraction technique' but does not list specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup Yes For SES and Real-time RNR, we set the exploitation level hyperparameter to 0.3... For Leduc Hold em, the blueprint is solved by Monte Carlo CFR (Lanctot et al., 2009) for 1,000,000 iterations... we set 1 kβ+1 to 1 16 in Leduc Hold em. In the case of Flop Hold em Poker, the blueprint is solved by Monte Carlo CFR for 100,000 iterations... infosets are clustered into 200 buckets... a finer-grained abstraction that has 400 buckets for each public betting history... we increase the value of 1 kβ+1 to 1 51...