Trust Region Policy Optimization
Authors: John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, Philipp Moritz
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. |
| Researcher Affiliation | Academia | John Schulman JOSCHU@EECS.BERKELEY.EDU Sergey Levine SLEVINE@EECS.BERKELEY.EDU Philipp Moritz PCMORITZ@EECS.BERKELEY.EDU Michael Jordan JORDAN@CS.BERKELEY.EDU Pieter Abbeel PABBEEL@CS.BERKELEY.EDU University of California, Berkeley, Department of Electrical Engineering and Computer Sciences |
| Pseudocode | Yes | Algorithm 1 Approximate policy iteration algorithm guaranteeing non-increasing expected cost |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the methodology described. |
| Open Datasets | Yes | We conducted the robotic locomotion experiments using the Mu Jo Co simulator (Todorov et al., 2012). We tested our algorithms on the same seven games reported on in (Mnih et al., 2013) and (Guo et al., 2014). |
| Dataset Splits | No | The paper does not provide explicit training/validation/test dataset splits, as it focuses on reinforcement learning in simulated environments (MuJoCo) and game environments (Atari) where data is generated dynamically rather than being a static, pre-split dataset. |
| Hardware Specification | No | The paper mentions a '16-core computer' but does not provide specific hardware details such as exact GPU/CPU models or memory amounts. |
| Software Dependencies | No | The paper mentions the Mu Jo Co simulator, but does not provide specific version numbers for it or any other key software dependencies. |
| Experiment Setup | Yes | We used δ = 0.01 for all experiments. See Table 2 in the Appendix for more details on the experimental setup and parameters used. The parameters used in the experiments are provided in Appendix E. |