reproducibilityindex.ai

Learning Complex Neural Network Policies with Trajectory Optimization

Authors: Sergey Levine, Vladlen Koltun

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated our approach on a set of challenging locomotion tasks, including a push recovery task that requires the policy to combine multiple recovery strategies learned in parallel from multiple trajectories. Our approach successfully learned a policy that could not only perform multiple different recoveries, but could also correctly choose the best strategy under new conditions. 4. Experimental Evaluation
Researcher Affiliation	Collaboration	Sergey Levine SVLEVINE@CS.STANFORD.EDU Computer Science Department, Stanford University, Stanford, CA 94305 USA Vladlen Koltun VLADLEN@ADOBE.COM Adobe Research, San Francisco, CA 94103 USA
Pseudocode	Yes	Algorithm 1 Constrained guided policy search Algorithm 2 Trajectory optimization iteration
Open Source Code	No	The paper provides a link to a supplementary video but does not explicitly state that the source code for the methodology is available, nor does it provide a direct link to a code repository.
Open Datasets	No	The paper describes the simulated environment and how initial trajectories were generated (e.g., 'Mu Jo Co physics simulator', 'hand-crafted locomotion system'), but does not mention the use of a specific publicly available dataset nor provide access information for any generated data.
Dataset Splits	No	The paper does not provide specific details regarding dataset splits (e.g., percentages, sample counts, or explicit splitting methodology) for training, validation, or testing.
Hardware Specification	No	The paper mentions running experiments on a 'simulated robot' within the 'Mu Jo Co physics simulator' but does not provide specific hardware details such as GPU/CPU models, memory, or other computing resource specifications.
Software Dependencies	No	The paper mentions 'Mu Jo Co physics simulator' and 'MATLAB CARE solver' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	The policies consisted of neural networks with one hidden layer, with a soft rectiﬁer a = log(1 + exp(z)) at the ﬁrst layer and linear connections to the output layer. Gaussian noise with a learned diagonal covariance was added to the output to create a stochastic policy. When evaluating the cost of a policy, the noise was removed, yielding a deterministic controller. While this class of policies is very expressive, it poses a considerable challenge for policy search methods, due to its nonlinearity and high dimensionality. As discussed in Section 3, the stochasticity of the policy depends on the cost magnitude. A low cost will produce broad trajectory distributions, which are good for learning, but will also produce a more stochastic policy, which might perform poorly. To speed up learning and still achieve a good ﬁnal policy, we found it useful to gradually increase the cost by a factor of 10 over the ﬁrst 50 iterations.