reproducibilityindex.ai

Solving Hard AI Planning Instances Using Curriculum-Driven Deep Reinforcement Learning

Authors: Dieqiao Feng, Carla Gomes, Bart Selman

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our approach based on deep reinforcement learning augmented with a curriculumdriven method is the ﬁrst one to solve hard instances within one day of training while other modern solvers cannot solve these instances within any reasonable time limit. Our experiments reveal that our curriculum-driven deep reinforcement learning framework can surpass traditional specialized solvers for a large set of instances from benchmark datasets such as XSokoban and Sasquatch.
Researcher Affiliation	Academia	Dieqiao Feng , Carla P. Gomes and Bart Selman Department of Computer Science Cornell University {dqfeng, gomes, selman}@cs.cornell.edu
Pseudocode	No	The paper describes the Monte Carlo Tree Search process (Selection, Expansion, Backpropagation) but does not present it in a formally labeled pseudocode or algorithm block.
Open Source Code	No	The paper does not provide any statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets	Yes	We report our experiments on XSokoban, the de facto standard test suite in the academic literature on Sokoban solver programming, as well as other large test suites1. 1Sokoban datasets available at http://sokobano.de/wiki/index.php?title=Solver Statistics
Dataset Splits	No	The paper describes a curriculum-driven strategy where it gradually increases problem complexity, but it does not specify explicit training/validation/test dataset splits with percentages, counts, or references to predefined splits.
Hardware Specification	No	The paper states: 'All solvers are running on the same CPU cores while our method utilizes additional 5 GPUs.' and mentions a 'compute cluster'. However, it does not provide specific models or detailed specifications for the CPU or GPU hardware used.
Software Dependencies	No	The paper mentions using 'vanilla Res Net [He et al., 2016]' as the network setting, but it does not provide specific version numbers for any software libraries, frameworks, or programming languages used (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup	Yes	In our experiments, we start with Imax = 500 and double it whenever learning fails after a long run. After 1600 rounds have been performed, we choose a move either greedily or proportionally with respect to the visit count at the root state s. We use 5 GPUs to train the network and each iteration contains 1000 epochs with mini-batch 160 in total. We used vanilla Res Net [He et al., 2016] with 8 residual blocks as the network setting for all experiments.