Deep Synoptic Monte-Carlo Planning in Reconnaissance Blind Chess
Authors: Gregory Clark
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper introduces deep synoptic Monte Carlo planning (DSMCP) for large imperfect information games. The algorithm constructs a belief state with an unweighted particle filter and plans via playouts that start at samples drawn from the belief state. The algorithm accounts for uncertainty by performing inference on synopses, a novel stochastic abstraction of information states. DSMCP is the basis of the program Penumbra, which won the official 2020 reconnaissance blind chess competition versus 33 other programs. This paper also evaluates algorithm variants that incorporate caution, paranoia, and a novel bandit algorithm. Furthermore, it audits the synopsis features used in Penumbra with per-bit saliency statistics. |
| Researcher Affiliation | Industry | Gregory Clark ML Collective, Google gregoryclark@google.com |
| Pseudocode | Yes | Algorithm 1 Bandit Action selection with a stochastic multi-armed bandit, Algorithm 2 Draw Sample Sample selection with rejection, Algorithm 3 Choose Action UCT playouts, Algorithm 4 Play Game DSMCP |
| Open Source Code | No | The paper does not provide concrete access to the source code for the methodology described. It mentions game logs are available online and acknowledges another researcher for open-sourcing their baseline bots, but no link or statement for Penumbra's code is provided. |
| Open Datasets | Yes | The games were downloaded from rbmc.jhuapl.edu in June, 2019 and rbc.jhuapl.edu in August, 2020. Additionally, 5,000 games were played locally by Stocky Inference. |
| Dataset Splits | Yes | 10% of the games were used as validation data based on game filename hashes. |
| Hardware Specification | Yes | Training and evaluation were run on four RTX 2080 Ti GPUs. |
| Software Dependencies | No | The paper mentions using a residual neural network and Bayes Elo, but does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | The constant b is the batch size for inference, d is the search depth, ℓis the size of approximate infostates, nvl is the virtual loss weight, and z is a threshold for increasing search depth. Inference is done in batches of 256 during both training and online planning. See the appendix for hyperparameter settings and accuracy cross tables. |