A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning Instances
Authors: Dieqiao Feng, Carla P. Gomes, Bart Selman
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach using a large Sokoban repository [23]. This repository contains 3362 Sokoban problems, including 225 instances that have not been solved with any state-of-the-art search based methods. [...] Table 3 provides a summary of our overall results. Our baseline strategy (BL) uses a convolutional network to capture the policy and samples uniformly from the sub-tasks. Our baseline can solve 30 of the 225 unsolved instances (13%), using 12 hours per instance, including training time. |
| Researcher Affiliation | Academia | Dieqiao Feng Department of Computer Science Cornell University Ithaca, NY 14850 dqfeng@cs.cornell.edu Carla P. Gomes Department of Computer Science Cornell University Ithaca, NY 14850 gomes@cs.cornell.edu Bart Selman Department of Computer Science Cornell University Ithaca, NY 14850 selman@cs.cornell.edu |
| Pseudocode | Yes | A formal algorithm description can be found in Algorithm 1. |
| Open Source Code | No | The paper does not provide any specific links to source code for the methodology described, nor does it explicitly state that the code is being released. |
| Open Datasets | Yes | For our experiments, we collect all instances from the XSokoban test suite as well as large tests suited on [23] to form a dataset containing a total of 3,362 different instances, among which 225 instances are labeled with "Solved by none" [...]. [23] Sokoban. Sokoban repository. http://sokobano.de/wiki/index.php?title=Solver_ Statistics, 2020. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning into train/validation/test sets. It describes dynamic generation of subcases for training rather than a fixed split. |
| Hardware Specification | No | The time limit for each instance is set to 12 hours on a 5-GPU machine and the whole learning procedure terminates once the original instance has been solved. The mention of '5-GPU machine' is too general without specific model numbers or detailed specifications. |
| Software Dependencies | No | The paper mentions various software components and concepts (e.g., 'deep neural network', 'Monte Carlo tree search', 'Alpha Zero setup'), but does not provide specific version numbers for any libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | We set the length limit to 2000 to prevent infinite loops. At each board state s0, we perform 1600 simulations to find the child node with maximum visit count. [...] The trainer keeps a pool of the latest 100000 improved policy/value data and trains 1000 minibatchs of size 64 in each iteration. |