Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning
Authors: Yevgen Chebotar, Karol Hausman, Marvin Zhang, Gaurav Sukhatme, Stefan Schaal, Sergey Levine
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our simulation and real-world experiments demonstrate that this method can solve challenging manipulation tasks with comparable or better performance than model-free methods while maintaining the sample efficiency of model-based methods. |
| Researcher Affiliation | Academia | 1University of Southern California, Los Angeles, CA, USA 2Max Planck Institute for Intelligent Systems, T ubingen, Germany 3University of California Berkeley, Berkeley, CA, USA. |
| Pseudocode | Yes | Algorithm 1 PILQR algorithm |
| Open Source Code | Yes | The performance of each method can be seen in our supplementary video.2 https://sites.google.com/site/icml17pilqr |
| Open Datasets | Yes | The reacher task from Open AI gym (Brockman et al., 2016) |
| Dataset Splits | No | The paper mentions 'test conditions' for evaluation but does not specify the training, validation, or testing split percentages or counts for the datasets used. |
| Hardware Specification | Yes | To evaluate our method on a real robotic platform, we use a PR2 robot (see Figure 1) to learn the following tasks: |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies used in the experiments. |
| Experiment Setup | Yes | Additional experimental setup details, including the exact cost functions, are provided in Appendix 8.3.1. (...) Our TVLG policies consist of 100 time steps and we control our robot at a frequency of 20 Hz. (...) In all of the real robot experiments, policies are updated every 10 rollouts and the final policy is obtained after 20-25 iterations, which corresponds to mastering the skill with less than one hour of experience. |