Trust Region Policy Optimization

Authors: John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, Philipp Moritz

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input.
Researcher Affiliation Academia John Schulman JOSCHU@EECS.BERKELEY.EDU Sergey Levine SLEVINE@EECS.BERKELEY.EDU Philipp Moritz PCMORITZ@EECS.BERKELEY.EDU Michael Jordan JORDAN@CS.BERKELEY.EDU Pieter Abbeel PABBEEL@CS.BERKELEY.EDU University of California, Berkeley, Department of Electrical Engineering and Computer Sciences
Pseudocode Yes Algorithm 1 Approximate policy iteration algorithm guaranteeing non-increasing expected cost
Open Source Code No The paper does not provide an explicit statement or link to the open-source code for the methodology described.
Open Datasets Yes We conducted the robotic locomotion experiments using the Mu Jo Co simulator (Todorov et al., 2012). We tested our algorithms on the same seven games reported on in (Mnih et al., 2013) and (Guo et al., 2014).
Dataset Splits No The paper does not provide explicit training/validation/test dataset splits, as it focuses on reinforcement learning in simulated environments (MuJoCo) and game environments (Atari) where data is generated dynamically rather than being a static, pre-split dataset.
Hardware Specification No The paper mentions a '16-core computer' but does not provide specific hardware details such as exact GPU/CPU models or memory amounts.
Software Dependencies No The paper mentions the Mu Jo Co simulator, but does not provide specific version numbers for it or any other key software dependencies.
Experiment Setup Yes We used δ = 0.01 for all experiments. See Table 2 in the Appendix for more details on the experimental setup and parameters used. The parameters used in the experiments are provided in Appendix E.