DeltaDou: Expert-level Doudizhu AI through Self-play

Authors: Qiqi Jiang, Kuangzheng Li, Boyao Du, Hao Chen, Hai Fang

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results show that self-play can significantly improve the performance of our agent in this multiagent imperfect information game. Even starting with a weak AI, our agent can achieve human expert level after days of self-play and training.
Researcher Affiliation Industry Qiqi Jiang , Kuangzheng Li , Boyao Du , Hao Chen and Hai Fang Sweet Code Inc, Beijing {jiangqiqi, likuangzheng, duboyao, chenhao, fanghai}@itgwn.com
Pseudocode Yes Algorithm 1 FPMCTS in Doudizhu
Open Source Code No The paper mentions an 'open source heuristics-based algorithm' (RHCP) from another source, but does not state that its own methodology's code is available or provide a link.
Open Datasets No In the first phase, 200,000 games were selfplayed by the heuristic algorithm, then the game results were used to generate the initial policy-value network under supervised learning.
Dataset Splits No The paper mentions a 'testing data set' of 100 games but does not specify a distinct validation set or provide explicit train/validation/test dataset splits for reproduction.
Hardware Specification Yes It took 2 months to train the network on 68 CPUs... It was ran on a single 8-core computer with the average time for a move of roughly 5 to 8 seconds.
Software Dependencies No The paper mentions various algorithms and frameworks (e.g., MCTS, neural networks, Alpha Zero-like framework) but does not provide specific version numbers for any software dependencies or libraries used.
Experiment Setup Yes In the first phase, 200,000 games were selfplayed by the heuristic algorithm... each episode contains 8000 games, and FPMCTS contains 400 playouts. Inference is used when any player has fewer than 15 cards in hand... The number of simulations in MCTS is set to 600 and c-p UCT is set to 2.