reproducibilityindex.ai

Actor-Critic Policy Optimization in a Large-Scale Imperfect-Information Game

Authors: Haobo Fu, Weiming Liu, Shuang Wu, Yijia Wang, Tao Yang, Kai Li, Junliang Xing, Bin Li, Bo Ma, QIANG FU, Yang Wei

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on the proposed 1-on-1 Mahjong benchmark and benchmarks from the literature demonstrate that ACH outperforms related state-of-the-art methods.
Researcher Affiliation	Collaboration	1 Tencent AI Lab, Shenzhen, China 2 University of Science and Technology of China, Hefei, China 3 Peking University, Beijing, China 4 Institute of Automation, Chinese Academy of Sciences, Beijing, China 5 School of Artiﬁcial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Pseudocode	Yes	The pseudocode of NW-CFR is given in Algorithm 1. ... The pseudocode of ACH is given in Algorithm 2.
Open Source Code	Yes	The code of the 1-on-1 Mahjong benchmark is available at https://github.com/yata0/Mahjong. The code of ACH is available at https://github.com/Liuweiming/ACH_poker.
Open Datasets	Yes	To facilitate research on large-scale 2-player zero-sum IIGs, we propose a 1-on-1 Mahjong benchmark. ... The code of the 1-on-1 Mahjong benchmark is available at https://github.com/yata0/Mahjong. ... FHP is a simpliﬁed Heads-up Limit Texas Hold em (HULH)... Additional results on smaller benchmarks from Open Spiel (Lanctot et al., 2019) are given in the Appendix G.
Dataset Splits	No	The paper describes training and evaluation but does not provide specific percentages or counts for training, validation, and test splits.
Hardware Specification	Yes	All methods run in an asynchronous training platform with overall 800 CPUs, 3200 GB memory, and 8 M40 GPUs in the Ubuntu 16.04 operating system.
Software Dependencies	No	The paper mentions 'Ubuntu 16.04 operating system' but does not provide specific version numbers for other key software components or libraries.
Experiment Setup	Yes	We performed a mild hyper-parameter search on PPO and shared the best setting for all methods. The advantage value is estimated by the Generalized Advantage Estimator (GAE(λ)) (Schulman et al., 2016) for all methods. An overview of the hyper-parameters is listed in the Appendix H.1. ... Table 5 gives an overview of hyper-parameters for each method.