reproducibilityindex.ai

Structured Evolution with Compact Architectures for Scalable Policy Optimization

Authors: Krzysztof Choromanski, Mark Rowland, Vikas Sindhwani, Richard Turner, Adrian Weller

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We consider two main experimental settings. Firstly, we compare structured evolution strategies against iid baseline approaches on a collection of 212 lower-dimensional blackbox optimization tasks drawn from a benchmark suite developed by the DFO community (Mor e & Wild, 2009), where the estimated gradients are used to perform optimization. Secondly, in the context of RL we train neural network policies on 12 Mujoco continuous control tasks from the Open AI Gym collection.
Researcher Affiliation	Collaboration	1Google Brain Robotics 2University of Cambridge 3The Alan Turing Institute. Correspondence to: Krzysztof Choromanski <kchoro@google.com>, Mark Rowland <mr504@cam.ac.uk>.
Pseudocode	No	The paper describes algorithms and methods using mathematical equations and textual explanations, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code	No	The paper does not provide any statement regarding the availability of open-source code for the methodology, nor does it include links to a code repository.
Open Datasets	Yes	We show that most robotics tasks from the Open AI Gym can be solved using neural networks with less than 300 parameters... We consider the a collection of reinforcement learning Open AI Gym (Brockman et al., 2016) tasks, summarized in Table 1.
Dataset Splits	No	The paper mentions 'training with the highest reward' and 'learning curves' and 'tested the best policy across all iteration steps' but does not provide specific train/validation/test dataset splits, percentages, or sample counts.
Hardware Specification	No	We use Tensor Flow distributed synchronous infrastructure with at most 400 workers (1 cpu / worker). This mentions 'cpu' but lacks specific details such as model, speed, or memory.
Software Dependencies	No	We use Tensor Flow distributed synchronous infrastructure... The optimization was conducted with the use of Adam Optimizer... For the optimization of the functions, we use MATLAB s inbuilt fminunc gradient-based optimization routine (running the BFGS Quasi-Newton method)... No specific version numbers are provided for these software components.
Experiment Setup	Yes	The optimization was conducted with the use of Adam Optimizer and the same ﬁxed learning rate α that was used in (Salimans et al., 2017). The FP:UN neural network architectures consist of input layer (state), two hidden layers of size 32 each and one output layer (proposed action). We use tanh nonlinearities. ...With Gaussian orthogonal exploration directions, each hidden layer was of size h = 41;... For the Ant environment we substantially reduced the number of steps per roll-out to s = 500, in the learning phase...