Guided Policy Search via Approximate Mirror Descent

Authors: William H. Montgomery, Sergey Levine

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide empirical results on several simulated robotic navigation and manipulation tasks that show that our method is stable and achieves similar or better performance when compared to prior guided policy search methods, with a simpler formulation and fewer hyperparameters.
Researcher Affiliation Academia William Montgomery Dept. of Computer Science and Engineering University of Washington wmonty@cs.washington.edu Sergey Levine Dept. of Computer Science and Engineering University of Washington svlevine@cs.washington.edu
Pseudocode Yes Algorithm 1 Generic guided policy search method, Algorithm 2 Mirror descent guided policy search (MDGPS): convex linear variant, Algorithm 3 Mirror descent guided policy search (MDGPS): unknown nonlinear dynamics
Open Source Code Yes Guided policy search code, including BADMM and MDGPS methods, is available at https://www.github.com/cbfinn/gps.
Open Datasets No We evaluate all methods on one simulated robotic navigation task and two manipulation tasks... Obstacle Navigation. In this task, a 2D point mass (grey) must navigate around obstacles... Peg Insertion. This task, which is more complex, requires controlling a 7 Do F 3D arm... Blind Peg Insertion. The last task is a blind variant of the peg insertion task...
Dataset Splits No No explicit train, validation, or test dataset splits (e.g., percentages, sample counts, or specific predefined splits) are mentioned.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments are mentioned.
Software Dependencies No The paper mentions using neural networks but does not provide specific software dependencies like framework names (e.g., PyTorch, TensorFlow) or their version numbers, or any other software with version details.
Experiment Setup Yes The global policy for each task consists of a fully connected neural network with two hidden layers with 40 rectified linear units.