Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics

Authors: Sergey Levine, Pieter Abbeel

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated both the trajectory optimization method and general policy search on simulated robotic manipulation and locomotion tasks. The state consisted of joint angles and velocities, and the actions corresponded to joint torques. Figure 1 compares our method with prior work on learning linear-Gaussian controllers for peg insertion, octopus arm, and swimming (walking is discussed in the next section).
Researcher Affiliation Academia Sergey Levine and Pieter Abbeel Department of Electrical Engineering and Computer Science University of California, Berkeley Berkeley, CA 94709 {svlevine, pabbeel}@eecs.berkeley.edu
Pseudocode Yes Algorithm 1 Guided policy search with unknown dynamics
Open Source Code No The paper provides a link to supplementary videos ('http://rll.berkeley.edu/nips2014gps/') but does not explicitly state that the source code for the methodology is available.
Open Datasets No The paper describes simulated tasks ('2D and 3D peg insertion, octopus arm control, and planar swimming and walking') and states that 'Details of the simulation and cost for each task are in the supplementary appendix.' It does not refer to a publicly available or open dataset with access information.
Dataset Splits No The paper describes experiments in simulated environments but does not provide specific details on training, validation, or test dataset splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes The paper states details such as the neural network architecture ('neural networks with one hidden layer and a soft rectifier nonlinearity'), and sample counts per iteration ('Our method used 5 rollouts with the GMM, and 20 without.' 'PILCO was provided with 5 rollouts per iteration, while other prior methods used 20 and 100.')