Safe Opponent-Exploitation Subgame Refinement

Authors: Mingyang Liu, Chengjie Wu, Qihan Liu, Yansen Jing, Jun Yang, Pingzhong Tang, Chongjie Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that SES significantly outperforms NE baselines and previous algorithms while keeping exploitability low at the same time.
Researcher Affiliation Academia 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Department of Automation, Tsinghua University
Pseudocode Yes The pseudocode of SES is shown in Appendix A.
Open Source Code No The code and licence of the code would be released upon the paper acceptance.
Open Datasets Yes Our experiment is done in Leduc Hold em [Southey et al., 2005] and Flop Hold em Poker (FHP) [Brown et al., 2019].
Dataset Splits No No explicit train/validation/test dataset splits are provided. The paper describes using Leduc Hold em and Flop Hold em Poker as experimental environments and how different types of opponents are generated.
Hardware Specification Yes We test it on Intel(R) Xeon(R) Platinum 8276L CPU @ 2.20GHz
Software Dependencies No No specific software dependencies with version numbers are mentioned in the paper.
Experiment Setup Yes In our experiments, we set the maximum number of CFR iterations to 10 million. The batch size for player 1 s strategy estimation is 50. The parameters in the learning rate schedule for the CFR algorithm are set to decay from 0.01 to 0.0001 over 10 million iterations... The estimation error is generated by adding Gaussian noise with zero mean and standard deviation of 0.1, 0.3, 0.6, 0.9, 1.2... We average results over 3 random seeds for opponent generation and 3 random seeds for blueprint generation.