reproducibilityindex.ai

Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics

Authors: Sergey Levine, Pieter Abbeel

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated both the trajectory optimization method and general policy search on simulated robotic manipulation and locomotion tasks. The state consisted of joint angles and velocities, and the actions corresponded to joint torques. Figure 1 compares our method with prior work on learning linear-Gaussian controllers for peg insertion, octopus arm, and swimming (walking is discussed in the next section).
Researcher Affiliation	Academia	Sergey Levine and Pieter Abbeel Department of Electrical Engineering and Computer Science University of California, Berkeley Berkeley, CA 94709 {svlevine, pabbeel}@eecs.berkeley.edu
Pseudocode	Yes	Algorithm 1 Guided policy search with unknown dynamics
Open Source Code	No	The paper provides a link to supplementary videos ('http://rll.berkeley.edu/nips2014gps/') but does not explicitly state that the source code for the methodology is available.
Open Datasets	No	The paper describes simulated tasks ('2D and 3D peg insertion, octopus arm control, and planar swimming and walking') and states that 'Details of the simulation and cost for each task are in the supplementary appendix.' It does not refer to a publicly available or open dataset with access information.
Dataset Splits	No	The paper describes experiments in simulated environments but does not provide specific details on training, validation, or test dataset splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	The paper states details such as the neural network architecture ('neural networks with one hidden layer and a soft rectiﬁer nonlinearity'), and sample counts per iteration ('Our method used 5 rollouts with the GMM, and 20 without.' 'PILCO was provided with 5 rollouts per iteration, while other prior methods used 20 and 100.')