From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization

Authors: Krzysztof M. Choromanski, Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Vikas Sindhwani

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide theoretical results and test ASEBO advantages over other methods empirically by evaluating it on the set of reinforcement learning policy optimization tasks as well as functions from the recently open-sourced Nevergrad library.
Researcher Affiliation Collaboration Krzysztof Choromanski Google Brain Robotics kchoro@google.com Aldo Pacchiano UC Berkeley pacchiano@berkeley.edu Jack Parker-Holder University of Oxford jackph@robots.ox.ac.uk Yunhao Tang Columbia University yt2541@columbia.edu Vikas Sindhwani Google Brain Robotics sindhwani@google.com
Pseudocode Yes Algorithm 1 ASEBO Algorithm Hyperparameters: number of iterations of full sampling l, smoothing parameter σ > 0, step size , PCA threshold , decay rate γ, total number of iterations T. Input: blackbox function F, vector 0 2 Rd where optimization starts. Cov0 2 {0}d d, p0 = 0. Output: vector T . for t = 0, . . . , T 1 do ... Algorithm 2 Explore estimator via exponentiated sampling Hyperparameters: smoothing parameter σ, horizon C, learning rate , probability regularizer β, initial probability parameter qt 0 2 (0, 1). Input: subspaces: LES active, LES,? active, function F, vector t Output: for l = 1, , C + 1 do ...
Open Source Code No The paper mentions open-source implementations of other methods used for comparison (pycma, ARS) but does not provide a link or statement for the open-sourcing of ASEBO's own code.
Open Datasets Yes We used the following environments from the Open AI Gym library: Swimmer-v2, Half Cheetahv2, Walker2d-v2, Reacher-v2, Pusher-v2 and Thrower-v2.
Dataset Splits No No explicit details about training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) are provided for the OpenAI Gym environments or Nevergrad functions.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments are mentioned.
Software Dependencies No The paper mentions various software components and implementations (e.g., Adam, pycma, ARS, OpenAI baselines for PPO/TRPO) but does not provide specific version numbers for any of them.
Experiment Setup Yes In all experiments we used policies encoded by neural network architectures of two hidden layers and with tanh nonlinearities, with > 100 parameters. For gradient-based optimization we use Adam. In practice one can setup the hyperparameters used by Algorithm 2 as follows: σ = 0.01, C = 10, = 0.01, β = 0.1, qt 0 = 0.1.