Learning Complex Neural Network Policies with Trajectory Optimization
Authors: Sergey Levine, Vladlen Koltun
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated our approach on a set of challenging locomotion tasks, including a push recovery task that requires the policy to combine multiple recovery strategies learned in parallel from multiple trajectories. Our approach successfully learned a policy that could not only perform multiple different recoveries, but could also correctly choose the best strategy under new conditions. 4. Experimental Evaluation |
| Researcher Affiliation | Collaboration | Sergey Levine SVLEVINE@CS.STANFORD.EDU Computer Science Department, Stanford University, Stanford, CA 94305 USA Vladlen Koltun VLADLEN@ADOBE.COM Adobe Research, San Francisco, CA 94103 USA |
| Pseudocode | Yes | Algorithm 1 Constrained guided policy search Algorithm 2 Trajectory optimization iteration |
| Open Source Code | No | The paper provides a link to a supplementary video but does not explicitly state that the source code for the methodology is available, nor does it provide a direct link to a code repository. |
| Open Datasets | No | The paper describes the simulated environment and how initial trajectories were generated (e.g., 'Mu Jo Co physics simulator', 'hand-crafted locomotion system'), but does not mention the use of a specific publicly available dataset nor provide access information for any generated data. |
| Dataset Splits | No | The paper does not provide specific details regarding dataset splits (e.g., percentages, sample counts, or explicit splitting methodology) for training, validation, or testing. |
| Hardware Specification | No | The paper mentions running experiments on a 'simulated robot' within the 'Mu Jo Co physics simulator' but does not provide specific hardware details such as GPU/CPU models, memory, or other computing resource specifications. |
| Software Dependencies | No | The paper mentions 'Mu Jo Co physics simulator' and 'MATLAB CARE solver' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | The policies consisted of neural networks with one hidden layer, with a soft rectifier a = log(1 + exp(z)) at the first layer and linear connections to the output layer. Gaussian noise with a learned diagonal covariance was added to the output to create a stochastic policy. When evaluating the cost of a policy, the noise was removed, yielding a deterministic controller. While this class of policies is very expressive, it poses a considerable challenge for policy search methods, due to its nonlinearity and high dimensionality. As discussed in Section 3, the stochasticity of the policy depends on the cost magnitude. A low cost will produce broad trajectory distributions, which are good for learning, but will also produce a more stochastic policy, which might perform poorly. To speed up learning and still achieve a good final policy, we found it useful to gradually increase the cost by a factor of 10 over the first 50 iterations. |