reproducibilityindex.ai

Deep Reactive Policies for Planning in Stochastic Nonlinear Domains

Authors: Thiago P. Bueno, Leliane N. de Barros, Denis D. Mauá, Scott Sanner7530-7537

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We benchmark our approach against stochastic planning domains exhibiting arbitrary differentiable nonlinear transition and cost functions (e.g., Reservoir Control, HVAC and Navigation). Results show that DRPs with more than 125,000 continuous action parameters can be optimized by our approach for problems with 30 state ﬂuents and 30 action ﬂuents on inexpensive hardware under 6 minutes. Also, we observed a speedup of 5 orders of magnitude in the average inference time per decision step of DRPs when compared to other state-of-the-art online gradient-based planners when the same level of solution quality is required.
Researcher Affiliation	Academia	1Department of Computer Science, University of S ao Paulo, Brazil 2Industrial Engineering, University of Toronto, Canada
Pseudocode	No	The paper describes algorithms and processes textually and with diagrams (like Figure 1 and 2), but no structured pseudocode or algorithm blocks are provided.
Open Source Code	Yes	We implemented tf-mdp in Tensor Flow (Abadi et al. 2016).3 We speciﬁed the domains/instances using RDDL (Relational Dynamic Inﬂuence Diagram Language) (Sanner 2010) and compiled the models to stochastic computation graphs in Tensor Flow using a compiler specifically built for this work.4 3https://github.com/thiagopbueno/tf-mdp 4https://github.com/thiagopbueno/rddl2tf
Open Datasets	Yes	We extended three domains previously proposed (e.g., Navigation (Faulwasser and Findeisen 2009), HVAC (Heating, Ventilation and Air Conditioning) (Agarwal et al. 2010), and Reservoir Control (Yeh 1985))
Dataset Splits	No	The paper does not explicitly provide training/validation/test dataset splits. It describes domains for policy training and evaluation over a horizon, which is different from typical data splits for supervised learning.
Hardware Specification	Yes	We conducted all experiments on a single 2.4 GHz Intel Core i5 8GB RAM machine.
Software Dependencies	No	The paper mentions "Tensor Flow (Abadi et al. 2016)" but does not provide a specific version number. It also mentions "RDDL (Relational Dynamic Inﬂuence Diagram Language) (Sanner 2010)" which is a language, not a software dependency with a version.
Experiment Setup	Yes	Training neural nets and especially deep neural nets such as DRPs can be especially sensitive to the choice of training hyperparameters (e.g, learning rate, batch size, number of training epochs). Our objective with the experiments is not necessarily to achieve the best possible outcome by carefully ﬁne-tuning hyperparameters, but instead to provide a reasonable comparison between the models. Hence, we selected the sensible default values shown in Table 3 and ﬁx them for all training runs. Table 3: Training hyperparameters for tf-mdp Domain Batch Learning rate Epochs Horizon Nav 256 0.001 200 20 HVAC 256 0.0001 200 40 Res 256 0.001 200 40