reproducibilityindex.ai

Deep Synoptic Monte-Carlo Planning in Reconnaissance Blind Chess

Authors: Gregory Clark

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper introduces deep synoptic Monte Carlo planning (DSMCP) for large imperfect information games. The algorithm constructs a belief state with an unweighted particle ﬁlter and plans via playouts that start at samples drawn from the belief state. The algorithm accounts for uncertainty by performing inference on synopses, a novel stochastic abstraction of information states. DSMCP is the basis of the program Penumbra, which won the ofﬁcial 2020 reconnaissance blind chess competition versus 33 other programs. This paper also evaluates algorithm variants that incorporate caution, paranoia, and a novel bandit algorithm. Furthermore, it audits the synopsis features used in Penumbra with per-bit saliency statistics.
Researcher Affiliation	Industry	Gregory Clark ML Collective, Google gregoryclark@google.com
Pseudocode	Yes	Algorithm 1 Bandit Action selection with a stochastic multi-armed bandit, Algorithm 2 Draw Sample Sample selection with rejection, Algorithm 3 Choose Action UCT playouts, Algorithm 4 Play Game DSMCP
Open Source Code	No	The paper does not provide concrete access to the source code for the methodology described. It mentions game logs are available online and acknowledges another researcher for open-sourcing their baseline bots, but no link or statement for Penumbra's code is provided.
Open Datasets	Yes	The games were downloaded from rbmc.jhuapl.edu in June, 2019 and rbc.jhuapl.edu in August, 2020. Additionally, 5,000 games were played locally by Stocky Inference.
Dataset Splits	Yes	10% of the games were used as validation data based on game ﬁlename hashes.
Hardware Specification	Yes	Training and evaluation were run on four RTX 2080 Ti GPUs.
Software Dependencies	No	The paper mentions using a residual neural network and Bayes Elo, but does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	Yes	The constant b is the batch size for inference, d is the search depth, ℓis the size of approximate infostates, nvl is the virtual loss weight, and z is a threshold for increasing search depth. Inference is done in batches of 256 during both training and online planning. See the appendix for hyperparameter settings and accuracy cross tables.