Combining Deep Reinforcement Learning and Search for Imperfect-Information Games
Authors: Noam Brown, Anton Bakhtin, Adam Lerer, Qucheng Gong
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 7 Experimental Setup 8 Experimental Results Figure 2 shows Re Be L reaches a level of exploitability in TEH equivalent to running about 125 iterations of full-game tabular CFR. Table 1 shows results for Re Be L in HUNL. |
| Researcher Affiliation | Industry | Facebook AI Research {noambrown,yolo,alerer,qucheng}@fb.com |
| Pseudocode | Yes | Algorithm 1 Re Be L: RL and Search for Imperfect-Information Games |
| Open Source Code | Yes | We also show Re Be L approximates a Nash equilibrium in Liar s Dice, another benchmark imperfect-information game, and open source our implementation of it.2 2https://github.com/facebookresearch/rebel |
| Open Datasets | Yes | We evaluate on the benchmark imperfect-information games of heads-up no-limit Texas hold em poker (HUNL) and Liar s Dice. The rules for both games are provided in Appendix C. |
| Dataset Splits | No | The paper describes a self-play reinforcement learning approach within game environments but does not provide specific training, validation, or test dataset splits with percentages or sample counts. |
| Hardware Specification | No | For this reason we use a single machine for training and up to 128 machines with 8 GPUs each for data generation. |
| Software Dependencies | No | We use Py Torch [46] to train the networks. |
| Experiment Setup | Yes | We use pointwise Huber loss as the criterion for the value function and mean squared error (MSE) over probabilities for the policy. In preliminary experiments we found MSE for the value network and cross entropy for the policy network did worse. See Appendix E for the hyperparameters. |