reproducibilityindex.ai

Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs

Authors: Jianzhun Du, Joseph Futoma, Finale Doshi-Velez

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally demonstrate the efﬁcacy of our methods across various continuous-time domains. 5 Experiments We evaluate our ODE-based models across four continuous-time domains. We show our models characterize continuous-time dynamics more accurately and allow us to ﬁnd a good policy with less data. We also demonstrate capabilities of our model-based methods that are not possible for model-free methods.
Researcher Affiliation	Academia	Jianzhun Du, Joseph Futoma, Finale Doshi-Velez Harvard University Cambridge, MA 02138 jzdu@g.harvard.edu, {jfutoma, finale}@seas.harvard.edu
Pseudocode	Yes	The full procedure can be found in Appendix A.1. The details of the algorithm can be found in Appendix A.2. The details of the algorithm can be found in Appendix A.3.
Open Source Code	No	The paper does not provide an explicit statement about releasing its own source code, nor does it include a link to a code repository for the methodology described. It only mentions using a third-party library: 'We use the implementation of ODE solvers from Python torchdiffeq library.' (footnote 2).
Open Datasets	Yes	Domains. We provide demonstrations on three simpler domains windy gridworld [Sutton and Barto, 2018], acrobot [Sutton, 1996], and HIV [Adams et al., 2004] and three Mujoco [Todorov et al., 2012] locomotion tasks Swimmer, Hopper and Half Cheetah interfaced through Open AI Gym [Brockman et al., 2016].
Dataset Splits	No	The paper mentions 'training dataset' and 'test dataset' but does not specify explicit percentages, sample counts, or methodologies for splitting data into training, validation, and testing sets.
Hardware Specification	No	The paper states: 'We thank Harvard Faculty of Arts and Sciences Research Computing and School of Engineering and Applied Sciences for providing computational resources.' This is a general statement and does not specify any exact CPU or GPU models, or other detailed hardware specifications used for experiments.
Software Dependencies	No	The paper states: 'We use the implementation of ODE solvers from Python torchdiffeq library.' (footnote 2). However, it does not specify version numbers for Python or the torchdiffeq library.
Experiment Setup	No	The paper describes the general experimental approach, including the use of 'mini-batch stochastic gradient descent' and combining 'MPC with the actor-critic method (DDPG)'. It mentions lambda (λ) as a hyperparameter in the training objective (Equation 9). However, it does not provide concrete values for hyperparameters such as learning rate, batch size, number of epochs, or specific optimizer settings.