reproducibilityindex.ai

Learning to search with MCTSnets

Authors: Arthur Guez, Theophane Weber, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, Remi Munos, David Silver

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	When applied to small searches in the well-known planning problem Sokoban, the learned search algorithm signiﬁcantly outperformed MCTS baselines. In the Sokoban domain, a classic planning task (Botea et al., 2003), we justify our network design choices and show that our learned search algorithm is able to outperform various model-free and model-based baselines. We investigate our architecture in the game of Sokoban, a classic, challenging puzzle game (Botea et al., 2003). As described above, our results are obtained in a supervised training regime. However, we continuously evaluate our network during training by running it as an agent in random Sokoban levels and report its success ratio in solving the levels.
Researcher Affiliation	Industry	1Deep Mind, London, UK. Correspondence to: A. Guez and T. Weber <{aguez, theophane}@google.com>.
Pseudocode	Yes	Algorithm 1: Value-Network Monte-Carlo Tree Search. Algorithm 2: MCTSnet For m = 1 . . . M, do simulation:
Open Source Code	No	The paper does not contain any explicit statement or link indicating that the source code for the described method is open-sourced or publicly available. It only provides a link to a video: 'A video of MCTSnet solving Sokoban levels is available here: https://goo.gl/2Bu8HD.'
Open Datasets	No	The paper mentions using the 'Sokoban' domain, a 'classic planning task (Botea et al., 2003)', and states 'the dataset is detailed in the appendix'. However, it does not provide concrete access information (e.g., direct URL, DOI, or specific repository) for the exact dataset used in their experiments. The appendix containing details is not provided in the paper snippet.
Dataset Splits	No	The paper states 'the dataset is detailed in the appendix' but does not provide specific percentages, sample counts, or methodology for training, validation, or test splits in the main text.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper does not list any specific software dependencies with version numbers (e.g., library names with their corresponding versions).
Experiment Setup	No	The paper states 'Using the architecture detailed in Sec. 3.4 and 25 simulations, MCTSnets reach 84 1 1% of levels solved' and 'Throughout this experimental section, we keep the architecture and size of both embedding and readout network ﬁxed, as detailed in the appendix.' However, it defers the detailed architecture and other specific experimental setup details (like hyperparameters, learning rates, or optimizer settings) to an appendix which is not provided in the main text.