Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics
Authors: Sergey Levine, Pieter Abbeel
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated both the trajectory optimization method and general policy search on simulated robotic manipulation and locomotion tasks. The state consisted of joint angles and velocities, and the actions corresponded to joint torques. Figure 1 compares our method with prior work on learning linear-Gaussian controllers for peg insertion, octopus arm, and swimming (walking is discussed in the next section). |
| Researcher Affiliation | Academia | Sergey Levine and Pieter Abbeel Department of Electrical Engineering and Computer Science University of California, Berkeley Berkeley, CA 94709 {svlevine, pabbeel}@eecs.berkeley.edu |
| Pseudocode | Yes | Algorithm 1 Guided policy search with unknown dynamics |
| Open Source Code | No | The paper provides a link to supplementary videos ('http://rll.berkeley.edu/nips2014gps/') but does not explicitly state that the source code for the methodology is available. |
| Open Datasets | No | The paper describes simulated tasks ('2D and 3D peg insertion, octopus arm control, and planar swimming and walking') and states that 'Details of the simulation and cost for each task are in the supplementary appendix.' It does not refer to a publicly available or open dataset with access information. |
| Dataset Splits | No | The paper describes experiments in simulated environments but does not provide specific details on training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | The paper states details such as the neural network architecture ('neural networks with one hidden layer and a soft rectifier nonlinearity'), and sample counts per iteration ('Our method used 5 rollouts with the GMM, and 20 without.' 'PILCO was provided with 5 rollouts per iteration, while other prior methods used 20 and 100.') |