PerfectDou: Dominating DouDizhu with Perfect Information Distillation

Authors: Guan Yang, Minghuan Liu, Weijun Hong, Weinan Zhang, Fei Fang, Guangjun Zeng, Yue Lin

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments we show how and why Perfect Dou beats all existing AI programs, and achieves state-of-the-art performance.
Researcher Affiliation Collaboration 1 Net Ease Games AI Lab, 2 Shanghai Jiao Tong University, 3 Carnegie Mellon University
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Project page at https://github.com/Netease-Games-AI-Lab-Guangzhou/Perfect Dou/.
Open Datasets Yes We evaluate Perfect Dou against the following algorithms under the open-source RLCard Environment [30] and Daochen Zha, Kwei-Herng Lai, Yuanpu Cao, Songyi Huang, Ruzhe Wei, Junyu Guo, and Xia Hu. Rlcard: A toolkit for reinforcement learning in card games. ar Xiv preprint ar Xiv:1910.04376, 2019.
Dataset Splits No The paper mentions training data collected via self-play and distributed training, and evaluates performance on 10,000 randomly generated decks. However, it does not provide specific training/validation/test dataset splits (e.g., percentages, sample counts for each split, or explicit validation set definition).
Hardware Specification Yes All evaluations are conducted on a single core of Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz.
Software Dependencies No The paper mentions using specific algorithms and the RLCard environment but does not provide specific version numbers for any software dependencies (e.g., "Python 3.8, PyTorch 1.9").
Experiment Setup Yes To train Perfect Dou, we utilize Proximal Policy Optimization (PPO) [19] with Generalized Advantage Estimation (GAE) [18] by self-play in a distributed training system. And each worker will load the latest model after 24 (8 for each player) steps sampling.