Structured Evolution with Compact Architectures for Scalable Policy Optimization
Authors: Krzysztof Choromanski, Mark Rowland, Vikas Sindhwani, Richard Turner, Adrian Weller
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We consider two main experimental settings. Firstly, we compare structured evolution strategies against iid baseline approaches on a collection of 212 lower-dimensional blackbox optimization tasks drawn from a benchmark suite developed by the DFO community (Mor e & Wild, 2009), where the estimated gradients are used to perform optimization. Secondly, in the context of RL we train neural network policies on 12 Mujoco continuous control tasks from the Open AI Gym collection. |
| Researcher Affiliation | Collaboration | 1Google Brain Robotics 2University of Cambridge 3The Alan Turing Institute. Correspondence to: Krzysztof Choromanski <kchoro@google.com>, Mark Rowland <mr504@cam.ac.uk>. |
| Pseudocode | No | The paper describes algorithms and methods using mathematical equations and textual explanations, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | No | The paper does not provide any statement regarding the availability of open-source code for the methodology, nor does it include links to a code repository. |
| Open Datasets | Yes | We show that most robotics tasks from the Open AI Gym can be solved using neural networks with less than 300 parameters... We consider the a collection of reinforcement learning Open AI Gym (Brockman et al., 2016) tasks, summarized in Table 1. |
| Dataset Splits | No | The paper mentions 'training with the highest reward' and 'learning curves' and 'tested the best policy across all iteration steps' but does not provide specific train/validation/test dataset splits, percentages, or sample counts. |
| Hardware Specification | No | We use Tensor Flow distributed synchronous infrastructure with at most 400 workers (1 cpu / worker). This mentions 'cpu' but lacks specific details such as model, speed, or memory. |
| Software Dependencies | No | We use Tensor Flow distributed synchronous infrastructure... The optimization was conducted with the use of Adam Optimizer... For the optimization of the functions, we use MATLAB s inbuilt fminunc gradient-based optimization routine (running the BFGS Quasi-Newton method)... No specific version numbers are provided for these software components. |
| Experiment Setup | Yes | The optimization was conducted with the use of Adam Optimizer and the same fixed learning rate α that was used in (Salimans et al., 2017). The FP:UN neural network architectures consist of input layer (state), two hidden layers of size 32 each and one output layer (proposed action). We use tanh nonlinearities. ...With Gaussian orthogonal exploration directions, each hidden layer was of size h = 41;... For the Ant environment we substantially reduced the number of steps per roll-out to s = 500, in the learning phase... |