A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning Instances

Authors: Dieqiao Feng, Carla P. Gomes, Bart Selman

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach using a large Sokoban repository [23]. This repository contains 3362 Sokoban problems, including 225 instances that have not been solved with any state-of-the-art search based methods. [...] Table 3 provides a summary of our overall results. Our baseline strategy (BL) uses a convolutional network to capture the policy and samples uniformly from the sub-tasks. Our baseline can solve 30 of the 225 unsolved instances (13%), using 12 hours per instance, including training time.
Researcher Affiliation Academia Dieqiao Feng Department of Computer Science Cornell University Ithaca, NY 14850 dqfeng@cs.cornell.edu Carla P. Gomes Department of Computer Science Cornell University Ithaca, NY 14850 gomes@cs.cornell.edu Bart Selman Department of Computer Science Cornell University Ithaca, NY 14850 selman@cs.cornell.edu
Pseudocode Yes A formal algorithm description can be found in Algorithm 1.
Open Source Code No The paper does not provide any specific links to source code for the methodology described, nor does it explicitly state that the code is being released.
Open Datasets Yes For our experiments, we collect all instances from the XSokoban test suite as well as large tests suited on [23] to form a dataset containing a total of 3,362 different instances, among which 225 instances are labeled with "Solved by none" [...]. [23] Sokoban. Sokoban repository. http://sokobano.de/wiki/index.php?title=Solver_ Statistics, 2020.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning into train/validation/test sets. It describes dynamic generation of subcases for training rather than a fixed split.
Hardware Specification No The time limit for each instance is set to 12 hours on a 5-GPU machine and the whole learning procedure terminates once the original instance has been solved. The mention of '5-GPU machine' is too general without specific model numbers or detailed specifications.
Software Dependencies No The paper mentions various software components and concepts (e.g., 'deep neural network', 'Monte Carlo tree search', 'Alpha Zero setup'), but does not provide specific version numbers for any libraries, frameworks, or programming languages used.
Experiment Setup Yes We set the length limit to 2000 to prevent infinite loops. At each board state s0, we perform 1600 simulations to find the child node with maximum visit count. [...] The trainer keeps a pool of the latest 100000 improved policy/value data and trains 1000 minibatchs of size 64 in each iteration.