Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

Authors: Noam Brown, Anton Bakhtin, Adam Lerer, Qucheng Gong

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 7 Experimental Setup 8 Experimental Results Figure 2 shows Re Be L reaches a level of exploitability in TEH equivalent to running about 125 iterations of full-game tabular CFR. Table 1 shows results for Re Be L in HUNL.
Researcher Affiliation Industry Facebook AI Research {noambrown,yolo,alerer,qucheng}@fb.com
Pseudocode Yes Algorithm 1 Re Be L: RL and Search for Imperfect-Information Games
Open Source Code Yes We also show Re Be L approximates a Nash equilibrium in Liar s Dice, another benchmark imperfect-information game, and open source our implementation of it.2 2https://github.com/facebookresearch/rebel
Open Datasets Yes We evaluate on the benchmark imperfect-information games of heads-up no-limit Texas hold em poker (HUNL) and Liar s Dice. The rules for both games are provided in Appendix C.
Dataset Splits No The paper describes a self-play reinforcement learning approach within game environments but does not provide specific training, validation, or test dataset splits with percentages or sample counts.
Hardware Specification No For this reason we use a single machine for training and up to 128 machines with 8 GPUs each for data generation.
Software Dependencies No We use Py Torch [46] to train the networks.
Experiment Setup Yes We use pointwise Huber loss as the criterion for the value function and mean squared error (MSE) over probabilities for the policy. In preliminary experiments we found MSE for the value network and cross entropy for the policy network did worse. See Appendix E for the hyperparameters.