Safe Opponent-Exploitation Subgame Refinement
Authors: Mingyang Liu, Chengjie Wu, Qihan Liu, Yansen Jing, Jun Yang, Pingzhong Tang, Chongjie Zhang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that SES significantly outperforms NE baselines and previous algorithms while keeping exploitability low at the same time. |
| Researcher Affiliation | Academia | 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Department of Automation, Tsinghua University |
| Pseudocode | Yes | The pseudocode of SES is shown in Appendix A. |
| Open Source Code | No | The code and licence of the code would be released upon the paper acceptance. |
| Open Datasets | Yes | Our experiment is done in Leduc Hold em [Southey et al., 2005] and Flop Hold em Poker (FHP) [Brown et al., 2019]. |
| Dataset Splits | No | No explicit train/validation/test dataset splits are provided. The paper describes using Leduc Hold em and Flop Hold em Poker as experimental environments and how different types of opponents are generated. |
| Hardware Specification | Yes | We test it on Intel(R) Xeon(R) Platinum 8276L CPU @ 2.20GHz |
| Software Dependencies | No | No specific software dependencies with version numbers are mentioned in the paper. |
| Experiment Setup | Yes | In our experiments, we set the maximum number of CFR iterations to 10 million. The batch size for player 1 s strategy estimation is 50. The parameters in the learning rate schedule for the CFR algorithm are set to decay from 0.01 to 0.0001 over 10 million iterations... The estimation error is generated by adding Gaussian noise with zero mean and standard deviation of 0.1, 0.3, 0.6, 0.9, 1.2... We average results over 3 random seeds for opponent generation and 3 random seeds for blueprint generation. |