Guided Policy Search via Approximate Mirror Descent
Authors: William H. Montgomery, Sergey Levine
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide empirical results on several simulated robotic navigation and manipulation tasks that show that our method is stable and achieves similar or better performance when compared to prior guided policy search methods, with a simpler formulation and fewer hyperparameters. |
| Researcher Affiliation | Academia | William Montgomery Dept. of Computer Science and Engineering University of Washington wmonty@cs.washington.edu Sergey Levine Dept. of Computer Science and Engineering University of Washington svlevine@cs.washington.edu |
| Pseudocode | Yes | Algorithm 1 Generic guided policy search method, Algorithm 2 Mirror descent guided policy search (MDGPS): convex linear variant, Algorithm 3 Mirror descent guided policy search (MDGPS): unknown nonlinear dynamics |
| Open Source Code | Yes | Guided policy search code, including BADMM and MDGPS methods, is available at https://www.github.com/cbfinn/gps. |
| Open Datasets | No | We evaluate all methods on one simulated robotic navigation task and two manipulation tasks... Obstacle Navigation. In this task, a 2D point mass (grey) must navigate around obstacles... Peg Insertion. This task, which is more complex, requires controlling a 7 Do F 3D arm... Blind Peg Insertion. The last task is a blind variant of the peg insertion task... |
| Dataset Splits | No | No explicit train, validation, or test dataset splits (e.g., percentages, sample counts, or specific predefined splits) are mentioned. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments are mentioned. |
| Software Dependencies | No | The paper mentions using neural networks but does not provide specific software dependencies like framework names (e.g., PyTorch, TensorFlow) or their version numbers, or any other software with version details. |
| Experiment Setup | Yes | The global policy for each task consists of a fully connected neural network with two hidden layers with 40 rectified linear units. |