Monte Carlo Tree Search with Boltzmann Exploration
Authors: Michael Painter, Mohamed Baioumy, Nick Hawes, Bruno Lacerda
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical analysis shows that our algorithms show consistent high performance across several benchmark domains, including the game of Go. |
| Researcher Affiliation | Academia | Michael Painter, Mohamed Baioumy, Nick Hawes, Bruno Lacerda Oxford Robotics Institute University of Oxford {mpainter, mohamed, nickh, bruno}@robots.ox.ac.uk |
| Pseudocode | No | The paper describes algorithms using text and equations but does not provide structured pseudocode blocks or algorithms labeled as such. |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of the methodology described in this paper. |
| Open Datasets | Yes | To validate our approach, we use the Frozen Lake environment [5], and the Sailing Problem [28], commonly used to evaluate tree search algorithms [28, 22, 33, 13]. ... We used an openly available value network V and policy network π from Kata Go [36]. |
| Dataset Splits | No | The paper describes evaluation procedures (e.g., 'evaluated every 250 trials using 250 trajectories') but does not specify traditional training/validation/test dataset splits as it primarily uses generative simulation environments rather than static datasets. |
| Hardware Specification | Yes | Each algorithm was limited to 5 seconds of compute time per move, allowed to use 32 search threads per move, and had access to 80 Intel Xeon E5-2698V4 CPUs clocked at 2.2GHz, and a single Nvidia V100 GPU on a shared compute cluster. |
| Software Dependencies | No | The paper mentions software like 'OpenAI Gym' [5] and 'Kata Go' [36], but it does not specify version numbers for these or any other software dependencies crucial for replication. |
| Experiment Setup | Yes | A horizon of 100 was used for Frozen Lake and 50 for the Sailing Problem. Parameters were selected using a hyper-parameter search (Appendix D.3). ... Each algorithm was limited to 5 seconds of compute time per move, allowed to use 32 search threads per move |