reproducibilityindex.ai

A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning Instances

Authors: Dieqiao Feng, Carla P. Gomes, Bart Selman

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach using a large Sokoban repository [23]. This repository contains 3362 Sokoban problems, including 225 instances that have not been solved with any state-of-the-art search based methods. [...] Table 3 provides a summary of our overall results. Our baseline strategy (BL) uses a convolutional network to capture the policy and samples uniformly from the sub-tasks. Our baseline can solve 30 of the 225 unsolved instances (13%), using 12 hours per instance, including training time.
Researcher Affiliation	Academia	Dieqiao Feng Department of Computer Science Cornell University Ithaca, NY 14850 dqfeng@cs.cornell.edu Carla P. Gomes Department of Computer Science Cornell University Ithaca, NY 14850 gomes@cs.cornell.edu Bart Selman Department of Computer Science Cornell University Ithaca, NY 14850 selman@cs.cornell.edu
Pseudocode	Yes	A formal algorithm description can be found in Algorithm 1.
Open Source Code	No	The paper does not provide any specific links to source code for the methodology described, nor does it explicitly state that the code is being released.
Open Datasets	Yes	For our experiments, we collect all instances from the XSokoban test suite as well as large tests suited on [23] to form a dataset containing a total of 3,362 different instances, among which 225 instances are labeled with "Solved by none" [...]. [23] Sokoban. Sokoban repository. http://sokobano.de/wiki/index.php?title=Solver_ Statistics, 2020.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning into train/validation/test sets. It describes dynamic generation of subcases for training rather than a fixed split.
Hardware Specification	No	The time limit for each instance is set to 12 hours on a 5-GPU machine and the whole learning procedure terminates once the original instance has been solved. The mention of '5-GPU machine' is too general without specific model numbers or detailed specifications.
Software Dependencies	No	The paper mentions various software components and concepts (e.g., 'deep neural network', 'Monte Carlo tree search', 'Alpha Zero setup'), but does not provide specific version numbers for any libraries, frameworks, or programming languages used.
Experiment Setup	Yes	We set the length limit to 2000 to prevent inﬁnite loops. At each board state s0, we perform 1600 simulations to ﬁnd the child node with maximum visit count. [...] The trainer keeps a pool of the latest 100000 improved policy/value data and trains 1000 minibatchs of size 64 in each iteration.