Learning to search with MCTSnets
Authors: Arthur Guez, Theophane Weber, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, Remi Munos, David Silver
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | When applied to small searches in the well-known planning problem Sokoban, the learned search algorithm significantly outperformed MCTS baselines. In the Sokoban domain, a classic planning task (Botea et al., 2003), we justify our network design choices and show that our learned search algorithm is able to outperform various model-free and model-based baselines. We investigate our architecture in the game of Sokoban, a classic, challenging puzzle game (Botea et al., 2003). As described above, our results are obtained in a supervised training regime. However, we continuously evaluate our network during training by running it as an agent in random Sokoban levels and report its success ratio in solving the levels. |
| Researcher Affiliation | Industry | 1Deep Mind, London, UK. Correspondence to: A. Guez and T. Weber <{aguez, theophane}@google.com>. |
| Pseudocode | Yes | Algorithm 1: Value-Network Monte-Carlo Tree Search. Algorithm 2: MCTSnet For m = 1 . . . M, do simulation: |
| Open Source Code | No | The paper does not contain any explicit statement or link indicating that the source code for the described method is open-sourced or publicly available. It only provides a link to a video: 'A video of MCTSnet solving Sokoban levels is available here: https://goo.gl/2Bu8HD.' |
| Open Datasets | No | The paper mentions using the 'Sokoban' domain, a 'classic planning task (Botea et al., 2003)', and states 'the dataset is detailed in the appendix'. However, it does not provide concrete access information (e.g., direct URL, DOI, or specific repository) for the exact dataset used in their experiments. The appendix containing details is not provided in the paper snippet. |
| Dataset Splits | No | The paper states 'the dataset is detailed in the appendix' but does not provide specific percentages, sample counts, or methodology for training, validation, or test splits in the main text. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers (e.g., library names with their corresponding versions). |
| Experiment Setup | No | The paper states 'Using the architecture detailed in Sec. 3.4 and 25 simulations, MCTSnets reach 84 1 1% of levels solved' and 'Throughout this experimental section, we keep the architecture and size of both embedding and readout network fixed, as detailed in the appendix.' However, it defers the detailed architecture and other specific experimental setup details (like hyperparameters, learning rates, or optimizer settings) to an appendix which is not provided in the main text. |