Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Scalable Planning with Deep Neural Network Learned Transition Models

Authors: Ga Wu, Buser Say, Scott Sanner

JAIR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present experimental results that empirically test the performance of both HDMILP-Plan and TF-Plan on multiple planning domains with learned neural network transition models. These experiments focus on continuous action domains since the intent of the paper is to compare the performance of HD-MILP-Plan to TF-Plan on domains where they are both applicable. To accomplish this task we ﬁrst present three nonlinear continuous action benchmark domains, namely: Reservoir Control, Heating, Ventilation and Air Conditioning, and Navigation. Then, we validate the transition learning performance of our proposed Re LU-based densely-connected neural networks with different network conﬁgurations in each domain. Finally we evaluate the efﬁcacy of both proposed planning frameworks based on the learned model by comparing them to strong baseline manually coded policies 10 in an online planning setting.
Researcher Affiliation	Academia	Ga Wu EMAIL Department of Mechanical and Industrial Engineering University of Toronto, Toronto, ON, Canada Vector Institute for Artiﬁcial Intelligence, Toronto, ON, Canada Buser Say EMAIL Department of Mechanical and Industrial Engineering University of Toronto, Toronto, ON, Canada Vector Institute for Artiﬁcial Intelligence, Toronto, ON, Canada Faculty of Information Technology Monash University, Melbourne, VIC, Australia Scott Sanner EMAIL Department of Mechanical and Industrial Engineering University of Toronto, Toronto, ON, Canada Vector Institute for Artiﬁcial Intelligence, Toronto, ON, Canada
Pseudocode	No	The paper describes the methodologies, including the MILP formulation and TF-Plan, using mathematical equations and textual descriptions, but does not present any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using off-the-shelf tools like TensorFlow and PyTorch and optimizing MILP with IBM ILOG CPLEX, but it does not contain an unambiguous statement from the authors that they are releasing their own code for the methodology described in the paper, nor does it provide a direct link to a source-code repository.
Open Datasets	No	Full RDDL (Sanner, 2010) speciﬁcations of all domains and instances deﬁned below and used for data generation and plan evaluation in the experimentation are listed in Appendix B. ... We train all neural networks using 10^5 data samples from simulation using a simple stochastic exploration policy.
Dataset Splits	Yes	80% of the sampled data was used for training with hyperparameters tuned on a subset of 20% validation data of the training data and 20% of the sampled data was held out for the test evaluation.
Hardware Specification	Yes	We optimized the MILP encodings using IBM ILOG CPLEX 12.7.1 with eight threads and a 1-hour total time limit per problem instance on a Mac Book Pro with 2.8 GHz Intel Core i7 16 GB memory. We optimize TF-Plan through Tensorﬂow 1.9 with an Nvidia GTX 1080 GPU with CUDA 9.0 on a Linux system with 16 GB memory.
Software Dependencies	Yes	We optimized the MILP encodings using IBM ILOG CPLEX 12.7.1... We optimize TF-Plan through Tensorﬂow 1.9 with an Nvidia GTX 1080 GPU with CUDA 9.0...
Experiment Setup	Yes	Throughout all experiments, we ﬁxed the dropout parameter p = 0.1 (cf. Section 3.3) at all hidden layers... We tuned the number of hidden layers in the set of {0 (linear), 1, 2} and the number of neurons for each layer in the set of {8, 16, 32, 64, 128}... We applied the RMSProp (Hinton et al., 2012) optimizer over 200 epochs... The results reported for TF-Plan, unless otherwise stated, are based on ﬁxed number of epochs for each domain where TF-Plan used 1000 epochs for Reservoir and HVAC, and 300 epochs for Navigation.