Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs

Authors: Jianzhun Du, Joseph Futoma, Finale Doshi-Velez

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally demonstrate the efficacy of our methods across various continuous-time domains. 5 Experiments We evaluate our ODE-based models across four continuous-time domains. We show our models characterize continuous-time dynamics more accurately and allow us to find a good policy with less data. We also demonstrate capabilities of our model-based methods that are not possible for model-free methods.
Researcher Affiliation Academia Jianzhun Du, Joseph Futoma, Finale Doshi-Velez Harvard University Cambridge, MA 02138 jzdu@g.harvard.edu, {jfutoma, finale}@seas.harvard.edu
Pseudocode Yes The full procedure can be found in Appendix A.1. The details of the algorithm can be found in Appendix A.2. The details of the algorithm can be found in Appendix A.3.
Open Source Code No The paper does not provide an explicit statement about releasing its own source code, nor does it include a link to a code repository for the methodology described. It only mentions using a third-party library: 'We use the implementation of ODE solvers from Python torchdiffeq library.' (footnote 2).
Open Datasets Yes Domains. We provide demonstrations on three simpler domains windy gridworld [Sutton and Barto, 2018], acrobot [Sutton, 1996], and HIV [Adams et al., 2004] and three Mujoco [Todorov et al., 2012] locomotion tasks Swimmer, Hopper and Half Cheetah interfaced through Open AI Gym [Brockman et al., 2016].
Dataset Splits No The paper mentions 'training dataset' and 'test dataset' but does not specify explicit percentages, sample counts, or methodologies for splitting data into training, validation, and testing sets.
Hardware Specification No The paper states: 'We thank Harvard Faculty of Arts and Sciences Research Computing and School of Engineering and Applied Sciences for providing computational resources.' This is a general statement and does not specify any exact CPU or GPU models, or other detailed hardware specifications used for experiments.
Software Dependencies No The paper states: 'We use the implementation of ODE solvers from Python torchdiffeq library.' (footnote 2). However, it does not specify version numbers for Python or the torchdiffeq library.
Experiment Setup No The paper describes the general experimental approach, including the use of 'mini-batch stochastic gradient descent' and combining 'MPC with the actor-critic method (DDPG)'. It mentions lambda (λ) as a hyperparameter in the training objective (Equation 9). However, it does not provide concrete values for hyperparameters such as learning rate, batch size, number of epochs, or specific optimizer settings.