Monte Carlo Tree Search with Boltzmann Exploration

Authors: Michael Painter, Mohamed Baioumy, Nick Hawes, Bruno Lacerda

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical analysis shows that our algorithms show consistent high performance across several benchmark domains, including the game of Go.
Researcher Affiliation Academia Michael Painter, Mohamed Baioumy, Nick Hawes, Bruno Lacerda Oxford Robotics Institute University of Oxford {mpainter, mohamed, nickh, bruno}@robots.ox.ac.uk
Pseudocode No The paper describes algorithms using text and equations but does not provide structured pseudocode blocks or algorithms labeled as such.
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of the methodology described in this paper.
Open Datasets Yes To validate our approach, we use the Frozen Lake environment [5], and the Sailing Problem [28], commonly used to evaluate tree search algorithms [28, 22, 33, 13]. ... We used an openly available value network V and policy network π from Kata Go [36].
Dataset Splits No The paper describes evaluation procedures (e.g., 'evaluated every 250 trials using 250 trajectories') but does not specify traditional training/validation/test dataset splits as it primarily uses generative simulation environments rather than static datasets.
Hardware Specification Yes Each algorithm was limited to 5 seconds of compute time per move, allowed to use 32 search threads per move, and had access to 80 Intel Xeon E5-2698V4 CPUs clocked at 2.2GHz, and a single Nvidia V100 GPU on a shared compute cluster.
Software Dependencies No The paper mentions software like 'OpenAI Gym' [5] and 'Kata Go' [36], but it does not specify version numbers for these or any other software dependencies crucial for replication.
Experiment Setup Yes A horizon of 100 was used for Frozen Lake and 50 for the Sailing Problem. Parameters were selected using a hyper-parameter search (Appendix D.3). ... Each algorithm was limited to 5 seconds of compute time per move, allowed to use 32 search threads per move