Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Scalable Planning with Deep Neural Network Learned Transition Models

Authors: Ga Wu, Buser Say, Scott Sanner

JAIR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present experimental results that empirically test the performance of both HDMILP-Plan and TF-Plan on multiple planning domains with learned neural network transition models. These experiments focus on continuous action domains since the intent of the paper is to compare the performance of HD-MILP-Plan to TF-Plan on domains where they are both applicable. To accomplish this task we first present three nonlinear continuous action benchmark domains, namely: Reservoir Control, Heating, Ventilation and Air Conditioning, and Navigation. Then, we validate the transition learning performance of our proposed Re LU-based densely-connected neural networks with different network configurations in each domain. Finally we evaluate the efficacy of both proposed planning frameworks based on the learned model by comparing them to strong baseline manually coded policies 10 in an online planning setting.
Researcher Affiliation Academia Ga Wu EMAIL Department of Mechanical and Industrial Engineering University of Toronto, Toronto, ON, Canada Vector Institute for Artificial Intelligence, Toronto, ON, Canada Buser Say EMAIL Department of Mechanical and Industrial Engineering University of Toronto, Toronto, ON, Canada Vector Institute for Artificial Intelligence, Toronto, ON, Canada Faculty of Information Technology Monash University, Melbourne, VIC, Australia Scott Sanner EMAIL Department of Mechanical and Industrial Engineering University of Toronto, Toronto, ON, Canada Vector Institute for Artificial Intelligence, Toronto, ON, Canada
Pseudocode No The paper describes the methodologies, including the MILP formulation and TF-Plan, using mathematical equations and textual descriptions, but does not present any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper mentions using off-the-shelf tools like TensorFlow and PyTorch and optimizing MILP with IBM ILOG CPLEX, but it does not contain an unambiguous statement from the authors that they are releasing their own code for the methodology described in the paper, nor does it provide a direct link to a source-code repository.
Open Datasets No Full RDDL (Sanner, 2010) specifications of all domains and instances defined below and used for data generation and plan evaluation in the experimentation are listed in Appendix B. ... We train all neural networks using 10^5 data samples from simulation using a simple stochastic exploration policy.
Dataset Splits Yes 80% of the sampled data was used for training with hyperparameters tuned on a subset of 20% validation data of the training data and 20% of the sampled data was held out for the test evaluation.
Hardware Specification Yes We optimized the MILP encodings using IBM ILOG CPLEX 12.7.1 with eight threads and a 1-hour total time limit per problem instance on a Mac Book Pro with 2.8 GHz Intel Core i7 16 GB memory. We optimize TF-Plan through Tensorflow 1.9 with an Nvidia GTX 1080 GPU with CUDA 9.0 on a Linux system with 16 GB memory.
Software Dependencies Yes We optimized the MILP encodings using IBM ILOG CPLEX 12.7.1... We optimize TF-Plan through Tensorflow 1.9 with an Nvidia GTX 1080 GPU with CUDA 9.0...
Experiment Setup Yes Throughout all experiments, we fixed the dropout parameter p = 0.1 (cf. Section 3.3) at all hidden layers... We tuned the number of hidden layers in the set of {0 (linear), 1, 2} and the number of neurons for each layer in the set of {8, 16, 32, 64, 128}... We applied the RMSProp (Hinton et al., 2012) optimizer over 200 epochs... The results reported for TF-Plan, unless otherwise stated, are based on fixed number of epochs for each domain where TF-Plan used 1000 epochs for Reservoir and HVAC, and 300 epochs for Navigation.