reproducibilityindex.ai

Guided Policy Search via Approximate Mirror Descent

Authors: William H. Montgomery, Sergey Levine

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide empirical results on several simulated robotic navigation and manipulation tasks that show that our method is stable and achieves similar or better performance when compared to prior guided policy search methods, with a simpler formulation and fewer hyperparameters.
Researcher Affiliation	Academia	William Montgomery Dept. of Computer Science and Engineering University of Washington wmonty@cs.washington.edu Sergey Levine Dept. of Computer Science and Engineering University of Washington svlevine@cs.washington.edu
Pseudocode	Yes	Algorithm 1 Generic guided policy search method, Algorithm 2 Mirror descent guided policy search (MDGPS): convex linear variant, Algorithm 3 Mirror descent guided policy search (MDGPS): unknown nonlinear dynamics
Open Source Code	Yes	Guided policy search code, including BADMM and MDGPS methods, is available at https://www.github.com/cbfinn/gps.
Open Datasets	No	We evaluate all methods on one simulated robotic navigation task and two manipulation tasks... Obstacle Navigation. In this task, a 2D point mass (grey) must navigate around obstacles... Peg Insertion. This task, which is more complex, requires controlling a 7 Do F 3D arm... Blind Peg Insertion. The last task is a blind variant of the peg insertion task...
Dataset Splits	No	No explicit train, validation, or test dataset splits (e.g., percentages, sample counts, or specific predefined splits) are mentioned.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments are mentioned.
Software Dependencies	No	The paper mentions using neural networks but does not provide specific software dependencies like framework names (e.g., PyTorch, TensorFlow) or their version numbers, or any other software with version details.
Experiment Setup	Yes	The global policy for each task consists of a fully connected neural network with two hidden layers with 40 rectified linear units.