Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs
Authors: Jianzhun Du, Joseph Futoma, Finale Doshi-Velez
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally demonstrate the efficacy of our methods across various continuous-time domains. 5 Experiments We evaluate our ODE-based models across four continuous-time domains. We show our models characterize continuous-time dynamics more accurately and allow us to find a good policy with less data. We also demonstrate capabilities of our model-based methods that are not possible for model-free methods. |
| Researcher Affiliation | Academia | Jianzhun Du, Joseph Futoma, Finale Doshi-Velez Harvard University Cambridge, MA 02138 jzdu@g.harvard.edu, {jfutoma, finale}@seas.harvard.edu |
| Pseudocode | Yes | The full procedure can be found in Appendix A.1. The details of the algorithm can be found in Appendix A.2. The details of the algorithm can be found in Appendix A.3. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its own source code, nor does it include a link to a code repository for the methodology described. It only mentions using a third-party library: 'We use the implementation of ODE solvers from Python torchdiffeq library.' (footnote 2). |
| Open Datasets | Yes | Domains. We provide demonstrations on three simpler domains windy gridworld [Sutton and Barto, 2018], acrobot [Sutton, 1996], and HIV [Adams et al., 2004] and three Mujoco [Todorov et al., 2012] locomotion tasks Swimmer, Hopper and Half Cheetah interfaced through Open AI Gym [Brockman et al., 2016]. |
| Dataset Splits | No | The paper mentions 'training dataset' and 'test dataset' but does not specify explicit percentages, sample counts, or methodologies for splitting data into training, validation, and testing sets. |
| Hardware Specification | No | The paper states: 'We thank Harvard Faculty of Arts and Sciences Research Computing and School of Engineering and Applied Sciences for providing computational resources.' This is a general statement and does not specify any exact CPU or GPU models, or other detailed hardware specifications used for experiments. |
| Software Dependencies | No | The paper states: 'We use the implementation of ODE solvers from Python torchdiffeq library.' (footnote 2). However, it does not specify version numbers for Python or the torchdiffeq library. |
| Experiment Setup | No | The paper describes the general experimental approach, including the use of 'mini-batch stochastic gradient descent' and combining 'MPC with the actor-critic method (DDPG)'. It mentions lambda (λ) as a hyperparameter in the training objective (Equation 9). However, it does not provide concrete values for hyperparameters such as learning rate, batch size, number of epochs, or specific optimizer settings. |