DeltaDou: Expert-level Doudizhu AI through Self-play
Authors: Qiqi Jiang, Kuangzheng Li, Boyao Du, Hao Chen, Hai Fang
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results show that self-play can significantly improve the performance of our agent in this multiagent imperfect information game. Even starting with a weak AI, our agent can achieve human expert level after days of self-play and training. |
| Researcher Affiliation | Industry | Qiqi Jiang , Kuangzheng Li , Boyao Du , Hao Chen and Hai Fang Sweet Code Inc, Beijing {jiangqiqi, likuangzheng, duboyao, chenhao, fanghai}@itgwn.com |
| Pseudocode | Yes | Algorithm 1 FPMCTS in Doudizhu |
| Open Source Code | No | The paper mentions an 'open source heuristics-based algorithm' (RHCP) from another source, but does not state that its own methodology's code is available or provide a link. |
| Open Datasets | No | In the first phase, 200,000 games were selfplayed by the heuristic algorithm, then the game results were used to generate the initial policy-value network under supervised learning. |
| Dataset Splits | No | The paper mentions a 'testing data set' of 100 games but does not specify a distinct validation set or provide explicit train/validation/test dataset splits for reproduction. |
| Hardware Specification | Yes | It took 2 months to train the network on 68 CPUs... It was ran on a single 8-core computer with the average time for a move of roughly 5 to 8 seconds. |
| Software Dependencies | No | The paper mentions various algorithms and frameworks (e.g., MCTS, neural networks, Alpha Zero-like framework) but does not provide specific version numbers for any software dependencies or libraries used. |
| Experiment Setup | Yes | In the first phase, 200,000 games were selfplayed by the heuristic algorithm... each episode contains 8000 games, and FPMCTS contains 400 playouts. Inference is used when any player has fewer than 15 cards in hand... The number of simulations in MCTS is set to 600 and c-p UCT is set to 2. |