Local Bayesian Optimization of Motor Skills

Authors: Riad Akrour, Dmitry Sorokin, Jan Peters, Gerhard Neumann

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our algorithm on several benchmark objective functions as well as a continuous robotic task in which an informative prior is obtained by imitation learning. and 4. Experiments
Researcher Affiliation Academia 1CLAS/IAS, TU Darmstadt, Darmstadt, Germany 2Max Planck Institute for Intelligent Systems, T ubingen, Germany 3LCAS, University of Lincoln, Lincoln, United Kingdom.
Pseudocode Yes Algorithm 1 Local Bayesian Optimization of Motor Skills
Open Source Code No The paper does not provide any statement or link regarding the availability of its source code.
Open Datasets Yes We then conduct a comparison to the state-of-the-art on the COmparing COntinuous optimisers (COCO) testbed on the 20 functions f5 to f24 (we refer the reader to http://coco.gforge.inria.fr/ for an illustration and the mathematical definition of each function).
Dataset Splits No The paper uses benchmark functions like the COCO testbed and a robotic task, but it does not specify explicit training, validation, or test dataset splits for these.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No We rely on the GPStuff library (Vanhatalo et al., 2013) for the GP implementation and the posterior sampling of hyper-parameters. We use the Bayes Opt library (Martinez-Cantin, 2014)... (no version numbers specified).
Experiment Setup Yes In all but the last experiment ϵ = β = .05 while for the robotics experiment with an initial solution learned by imitation learning we set a more aggressive step size and entropy reduction ϵ = β = 1. We choose to use an equality constraint for the entropy reduction for both algorithms. As a result, both L-Bayes Opt and MORE will have the same entropy at every iteration and any difference in performance will be attributed to a better location of the mean, adaptation of the covariance matrix or sampling procedure rather than a faster reduction in exploration. In all but the last experiment ϵ = β = .05 while for the robotics experiment with an initial solution learned by imitation learning we set a more aggressive step size and entropy reduction ϵ = β = 1. And will sample ten points per iteration. and for the local stochastic search algorithms we set the initial distribution to π0 = N(0, 3I).