Learning to Guide Random Search

Authors: Ozan Sener, Vladlen Koltun

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate the method on continuous optimization benchmarks and high-dimensional continuous control problems. Our method achieves significantly lower sample complexity than Augmented Random Search, Bayesian optimization, covariance matrix adaptation (CMA-ES), and other derivative-free optimization algorithms. We conduct extensive experiments on continuous control problems, continuous optimization benchmarks, and gradient-free optimization of an airfoil.
Researcher Affiliation Industry Ozan Sener Intel Labs Vladlen Koltun Intel Labs
Pseudocode Yes Algorithm 1 Random Search; Algorithm 2 Manifold Random Search; Algorithm 3 Learned Manifold Random Search (LMRS)
Open Source Code Yes A full implementation is available at https://github.com/intel-isl/LMRS.
Open Datasets Yes We use the Mu Jo Co simulator (Todorov et al., 2012) to evaluate our method on high-dimensional control problems. We use 46 single-objective unconstrained functions from the Pagmo suite of continuous optimization benchmarks (Biscani et al., 2019). We use the XFoil simulator (Drela, 1989) to benchmark gradient-free optimization of an airfoil.
Dataset Splits No The paper discusses running experiments and averaging results over multiple random seeds, but it does not specify explicit training, validation, and test dataset splits with percentages or sample counts.
Hardware Specification Yes Measurements are performed on Intel Xeon E7-8890 v3 processors and Nvidia Ge Force RTX 2080 Ti GPUs.
Software Dependencies No The paper mentions software like 'Mu Jo Co simulator', 'Pagmo suite', 'XFoil simulator', 'pycma', and 'GPy Torch', but it does not provide specific version numbers for any of these.
Experiment Setup Yes We use linear policies and include all the tricks (whitening the observation space and scaling the step size using the variance of the rewards) from Mania et al. (2018). We use grid search over δ and n = k values and choose the best performing one in all experiments. We initialize our models with standard normal distributions. We use online gradient descent to learn the model parameters using SGD with momentum as 0.9. We also perform grid search for learning rate over {1e 4, 1e 3, 1e 2}. We set λ = 103 for all experiments. We initialize all solutions with zero mean unit variance normal variables and use grid search over δ {1e 4, 1e 3, 1e 2, 1e 1}, k {2, 5, 10, 50}, and α {1e 4, 1e 3, 1e 2, 1e 1}.