A Restart-based Rank-1 Evolution Strategy for Reinforcement Learning

Authors: Zefeng Chen, Yuren Zhou, Xiao-yu He, Siyu Jiang

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on classic control problems and Atari games show that the proposed algorithm is superior to or competitive with state-of-the-art algorithms for reinforcement learning, demonstrating the effectiveness of the proposed algorithm.
Researcher Affiliation Academia 1School of Data and Computer Science, Sun Yat-sen University 2Engineering Research Institute, Guangzhou College of South China University of Technology 3School of Software Engineering, South China University of Technology
Pseudocode Yes Algorithm 1 R-R1-ES
Open Source Code No The paper does not provide any statement or link regarding the availability of its source code.
Open Datasets Yes Open AI Gym [Brockman et al., 2016], including Cart Pole-v1, Mountain Car-v0 and Pendulum-v0. four Atari games (i.e., Breakout-v0, Pongv0, Qbert-v0 and Seaquest-v0)
Dataset Splits No The paper mentions '21 independent training runs' and '30 independent evaluation runs' but does not specify dataset split percentages or sample counts for train/validation/test sets.
Hardware Specification No As for the training of each ES variant on each game, 20 CPUs are used with a time budget of 20 hours.
Software Dependencies No The paper mentions software environments and frameworks like Open AI Gym and DQN but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes we directly use a simple neural network with two hidden layers (one with 30 units and one with 20 units) to act as the network to be trained by ES. Virtual batch normalization is utilized as suggested in [Salimans et al., 2017]. The population size λ is set as λ = 20. Besides, the initial value µ0 for the proposed R-R1-ES is set as µ0 = 10. The maximum number of generations T is set as follows: T = 100 for Cart Pole-v1 and Pendulum-v0, T = 10000 for Mountain Car-v0. The population size λ and the initial value µ0 are set to λ = 798 and µ0 = 50, respectively. As for the other parameters used in each ES variant, we follow the suggestions given by their developers, and keep their settings the same as in their original literature.