Structured Evolution with Compact Architectures for Scalable Policy Optimization

Authors: Krzysztof Choromanski, Mark Rowland, Vikas Sindhwani, Richard Turner, Adrian Weller

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We consider two main experimental settings. Firstly, we compare structured evolution strategies against iid baseline approaches on a collection of 212 lower-dimensional blackbox optimization tasks drawn from a benchmark suite developed by the DFO community (Mor e & Wild, 2009), where the estimated gradients are used to perform optimization. Secondly, in the context of RL we train neural network policies on 12 Mujoco continuous control tasks from the Open AI Gym collection.
Researcher Affiliation Collaboration 1Google Brain Robotics 2University of Cambridge 3The Alan Turing Institute. Correspondence to: Krzysztof Choromanski <kchoro@google.com>, Mark Rowland <mr504@cam.ac.uk>.
Pseudocode No The paper describes algorithms and methods using mathematical equations and textual explanations, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code No The paper does not provide any statement regarding the availability of open-source code for the methodology, nor does it include links to a code repository.
Open Datasets Yes We show that most robotics tasks from the Open AI Gym can be solved using neural networks with less than 300 parameters... We consider the a collection of reinforcement learning Open AI Gym (Brockman et al., 2016) tasks, summarized in Table 1.
Dataset Splits No The paper mentions 'training with the highest reward' and 'learning curves' and 'tested the best policy across all iteration steps' but does not provide specific train/validation/test dataset splits, percentages, or sample counts.
Hardware Specification No We use Tensor Flow distributed synchronous infrastructure with at most 400 workers (1 cpu / worker). This mentions 'cpu' but lacks specific details such as model, speed, or memory.
Software Dependencies No We use Tensor Flow distributed synchronous infrastructure... The optimization was conducted with the use of Adam Optimizer... For the optimization of the functions, we use MATLAB s inbuilt fminunc gradient-based optimization routine (running the BFGS Quasi-Newton method)... No specific version numbers are provided for these software components.
Experiment Setup Yes The optimization was conducted with the use of Adam Optimizer and the same fixed learning rate α that was used in (Salimans et al., 2017). The FP:UN neural network architectures consist of input layer (state), two hidden layers of size 32 each and one output layer (proposed action). We use tanh nonlinearities. ...With Gaussian orthogonal exploration directions, each hidden layer was of size h = 41;... For the Ant environment we substantially reduced the number of steps per roll-out to s = 500, in the learning phase...