reproducibilityindex.ai

Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

Authors: Noam Brown, Anton Bakhtin, Adam Lerer, Qucheng Gong

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	7 Experimental Setup 8 Experimental Results Figure 2 shows Re Be L reaches a level of exploitability in TEH equivalent to running about 125 iterations of full-game tabular CFR. Table 1 shows results for Re Be L in HUNL.
Researcher Affiliation	Industry	Facebook AI Research {noambrown,yolo,alerer,qucheng}@fb.com
Pseudocode	Yes	Algorithm 1 Re Be L: RL and Search for Imperfect-Information Games
Open Source Code	Yes	We also show Re Be L approximates a Nash equilibrium in Liar s Dice, another benchmark imperfect-information game, and open source our implementation of it.2 2https://github.com/facebookresearch/rebel
Open Datasets	Yes	We evaluate on the benchmark imperfect-information games of heads-up no-limit Texas hold em poker (HUNL) and Liar s Dice. The rules for both games are provided in Appendix C.
Dataset Splits	No	The paper describes a self-play reinforcement learning approach within game environments but does not provide specific training, validation, or test dataset splits with percentages or sample counts.
Hardware Specification	No	For this reason we use a single machine for training and up to 128 machines with 8 GPUs each for data generation.
Software Dependencies	No	We use Py Torch [46] to train the networks.
Experiment Setup	Yes	We use pointwise Huber loss as the criterion for the value function and mean squared error (MSE) over probabilities for the policy. In preliminary experiments we found MSE for the value network and cross entropy for the policy network did worse. See Appendix E for the hyperparameters.