Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Trust Region Policy Optimization
Authors: John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, Philipp Moritz
ICML 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. |
| Researcher Affiliation | Academia | John Schulman EMAIL Sergey Levine EMAIL Philipp Moritz EMAIL Michael Jordan EMAIL Pieter Abbeel EMAIL University of California, Berkeley, Department of Electrical Engineering and Computer Sciences |
| Pseudocode | Yes | Algorithm 1 Approximate policy iteration algorithm guaranteeing non-increasing expected cost |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the methodology described. |
| Open Datasets | Yes | We conducted the robotic locomotion experiments using the Mu Jo Co simulator (Todorov et al., 2012). We tested our algorithms on the same seven games reported on in (Mnih et al., 2013) and (Guo et al., 2014). |
| Dataset Splits | No | The paper does not provide explicit training/validation/test dataset splits, as it focuses on reinforcement learning in simulated environments (MuJoCo) and game environments (Atari) where data is generated dynamically rather than being a static, pre-split dataset. |
| Hardware Specification | No | The paper mentions a '16-core computer' but does not provide specific hardware details such as exact GPU/CPU models or memory amounts. |
| Software Dependencies | No | The paper mentions the Mu Jo Co simulator, but does not provide specific version numbers for it or any other key software dependencies. |
| Experiment Setup | Yes | We used δ = 0.01 for all experiments. See Table 2 in the Appendix for more details on the experimental setup and parameters used. The parameters used in the experiments are provided in Appendix E. |