Deep Synoptic Monte-Carlo Planning in Reconnaissance Blind Chess

Authors: Gregory Clark

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper introduces deep synoptic Monte Carlo planning (DSMCP) for large imperfect information games. The algorithm constructs a belief state with an unweighted particle filter and plans via playouts that start at samples drawn from the belief state. The algorithm accounts for uncertainty by performing inference on synopses, a novel stochastic abstraction of information states. DSMCP is the basis of the program Penumbra, which won the official 2020 reconnaissance blind chess competition versus 33 other programs. This paper also evaluates algorithm variants that incorporate caution, paranoia, and a novel bandit algorithm. Furthermore, it audits the synopsis features used in Penumbra with per-bit saliency statistics.
Researcher Affiliation Industry Gregory Clark ML Collective, Google gregoryclark@google.com
Pseudocode Yes Algorithm 1 Bandit Action selection with a stochastic multi-armed bandit, Algorithm 2 Draw Sample Sample selection with rejection, Algorithm 3 Choose Action UCT playouts, Algorithm 4 Play Game DSMCP
Open Source Code No The paper does not provide concrete access to the source code for the methodology described. It mentions game logs are available online and acknowledges another researcher for open-sourcing their baseline bots, but no link or statement for Penumbra's code is provided.
Open Datasets Yes The games were downloaded from rbmc.jhuapl.edu in June, 2019 and rbc.jhuapl.edu in August, 2020. Additionally, 5,000 games were played locally by Stocky Inference.
Dataset Splits Yes 10% of the games were used as validation data based on game filename hashes.
Hardware Specification Yes Training and evaluation were run on four RTX 2080 Ti GPUs.
Software Dependencies No The paper mentions using a residual neural network and Bayes Elo, but does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes The constant b is the batch size for inference, d is the search depth, ℓis the size of approximate infostates, nvl is the virtual loss weight, and z is a threshold for increasing search depth. Inference is done in batches of 256 during both training and online planning. See the appendix for hyperparameter settings and accuracy cross tables.