PerfectDou: Dominating DouDizhu with Perfect Information Distillation
Authors: Guan Yang, Minghuan Liu, Weijun Hong, Weinan Zhang, Fei Fang, Guangjun Zeng, Yue Lin
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments we show how and why Perfect Dou beats all existing AI programs, and achieves state-of-the-art performance. |
| Researcher Affiliation | Collaboration | 1 Net Ease Games AI Lab, 2 Shanghai Jiao Tong University, 3 Carnegie Mellon University |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Project page at https://github.com/Netease-Games-AI-Lab-Guangzhou/Perfect Dou/. |
| Open Datasets | Yes | We evaluate Perfect Dou against the following algorithms under the open-source RLCard Environment [30] and Daochen Zha, Kwei-Herng Lai, Yuanpu Cao, Songyi Huang, Ruzhe Wei, Junyu Guo, and Xia Hu. Rlcard: A toolkit for reinforcement learning in card games. ar Xiv preprint ar Xiv:1910.04376, 2019. |
| Dataset Splits | No | The paper mentions training data collected via self-play and distributed training, and evaluates performance on 10,000 randomly generated decks. However, it does not provide specific training/validation/test dataset splits (e.g., percentages, sample counts for each split, or explicit validation set definition). |
| Hardware Specification | Yes | All evaluations are conducted on a single core of Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz. |
| Software Dependencies | No | The paper mentions using specific algorithms and the RLCard environment but does not provide specific version numbers for any software dependencies (e.g., "Python 3.8, PyTorch 1.9"). |
| Experiment Setup | Yes | To train Perfect Dou, we utilize Proximal Policy Optimization (PPO) [19] with Generalized Advantage Estimation (GAE) [18] by self-play in a distributed training system. And each worker will load the latest model after 24 (8 for each player) steps sampling. |