reproducibilityindex.ai

Single-Agent Policy Tree Search With Guarantees

Authors: Laurent Orseau, Levi Lelis, Tor Lattimore, Theophane Weber

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate these tree search algorithms on 1,000 computer-generated levels of Sokoban, where the policy used to guide the search comes from a neural network trained using A3C. Our results show that the policy tree search algorithms we introduce are competitive with a state-of-the-art domain-independent planner that uses heuristic search.
Researcher Affiliation	Collaboration	Laurent Orseau Deep Mind, London, UK lorseau@google.com; Levi H. S. Lelis Universidade Federal de Viçosa, Brazil levi.lelis@ufv.br; Tor Lattimore Deep Mind, London, UK lattimore@google.com; Théophane Weber Deep Mind, London, UK theophane@google.com
Pseudocode	Yes	Algorithm 1: Levin tree search. ... Algorithm 2: Sampling and execution of a single trajectory. ... Algorithm (3) Sampling of nsims trajectories of ﬁxed depths dmax N1. ... Algorithm (4) Sampling of nsims trajectories of depths that follow A6519, with optional coefﬁcient dmin N1.
Open Source Code	No	The paper provides a link for the computer-generated levels (data) used in the experiments: "The levels are available at https://github.com/deepmind/boxoban-levels/unfiltered/test." However, it does not provide concrete access to the source code for the proposed Levin TS or Luby TS methodologies.
Open Datasets	Yes	We test our algorithms on 1,000 computer-generated levels of Sokoban [Racanière et al., 2017] of 10x10 grid cells and 4 boxes. The levels are available at https://github.com/deepmind/boxoban-levels/unfiltered/test.
Dataset Splits	No	The paper mentions "1,000 computer-generated levels of Sokoban" for evaluation, and that the A3C policy was "learned only from the 65% easiest levels". However, it does not explicitly provide specific dataset split information (e.g., percentages, sample counts, or citations to predefined splits) for training, validation, and testing of their proposed algorithms on these 1000 levels, nor does it detail how the 65% easiest levels were determined for policy training in a way that allows reproduction.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications, or cloud instance types) used for running its experiments.
Software Dependencies	No	The paper mentions the use of "A3C" for training the policy and comparison with "LAMA planner" and "Fast Downward", but it does not specify software names with version numbers (e.g., Python version, specific library versions like PyTorch or TensorFlow, or exact versions of the planners).
Experiment Setup	Yes	We test the following algorithms and parameters: Luby TS(256,1), Luby TS(256,32), Luby TS(512, 32), multi TS(1, 200), multi TS(100, 200), multi TS(200, 200), Levin TS. ... In addition to the policy trained with A3C, we tested Levin TS, Luby TS, and multi TS with a variant of the policy in which we add 1% of noise to the probabilities output of the neural network. That is, these variants use the policy π(a\|n) = (1 ε)π(a\|n) + ε 1/4 where π is the network s policy and ε = 0.01, to guide their search.