Solving Hard AI Planning Instances Using Curriculum-Driven Deep Reinforcement Learning
Authors: Dieqiao Feng, Carla Gomes, Bart Selman
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach based on deep reinforcement learning augmented with a curriculumdriven method is the first one to solve hard instances within one day of training while other modern solvers cannot solve these instances within any reasonable time limit. Our experiments reveal that our curriculum-driven deep reinforcement learning framework can surpass traditional specialized solvers for a large set of instances from benchmark datasets such as XSokoban and Sasquatch. |
| Researcher Affiliation | Academia | Dieqiao Feng , Carla P. Gomes and Bart Selman Department of Computer Science Cornell University {dqfeng, gomes, selman}@cs.cornell.edu |
| Pseudocode | No | The paper describes the Monte Carlo Tree Search process (Selection, Expansion, Backpropagation) but does not present it in a formally labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any statement about releasing source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | We report our experiments on XSokoban, the de facto standard test suite in the academic literature on Sokoban solver programming, as well as other large test suites1. 1Sokoban datasets available at http://sokobano.de/wiki/index.php?title=Solver Statistics |
| Dataset Splits | No | The paper describes a curriculum-driven strategy where it gradually increases problem complexity, but it does not specify explicit training/validation/test dataset splits with percentages, counts, or references to predefined splits. |
| Hardware Specification | No | The paper states: 'All solvers are running on the same CPU cores while our method utilizes additional 5 GPUs.' and mentions a 'compute cluster'. However, it does not provide specific models or detailed specifications for the CPU or GPU hardware used. |
| Software Dependencies | No | The paper mentions using 'vanilla Res Net [He et al., 2016]' as the network setting, but it does not provide specific version numbers for any software libraries, frameworks, or programming languages used (e.g., PyTorch, TensorFlow, Python version). |
| Experiment Setup | Yes | In our experiments, we start with Imax = 500 and double it whenever learning fails after a long run. After 1600 rounds have been performed, we choose a move either greedily or proportionally with respect to the visit count at the root state s. We use 5 GPUs to train the network and each iteration contains 1000 epochs with mini-batch 160 in total. We used vanilla Res Net [He et al., 2016] with 8 residual blocks as the network setting for all experiments. |