A Restart-based Rank-1 Evolution Strategy for Reinforcement Learning
Authors: Zefeng Chen, Yuren Zhou, Xiao-yu He, Siyu Jiang
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on classic control problems and Atari games show that the proposed algorithm is superior to or competitive with state-of-the-art algorithms for reinforcement learning, demonstrating the effectiveness of the proposed algorithm. |
| Researcher Affiliation | Academia | 1School of Data and Computer Science, Sun Yat-sen University 2Engineering Research Institute, Guangzhou College of South China University of Technology 3School of Software Engineering, South China University of Technology |
| Pseudocode | Yes | Algorithm 1 R-R1-ES |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of its source code. |
| Open Datasets | Yes | Open AI Gym [Brockman et al., 2016], including Cart Pole-v1, Mountain Car-v0 and Pendulum-v0. four Atari games (i.e., Breakout-v0, Pongv0, Qbert-v0 and Seaquest-v0) |
| Dataset Splits | No | The paper mentions '21 independent training runs' and '30 independent evaluation runs' but does not specify dataset split percentages or sample counts for train/validation/test sets. |
| Hardware Specification | No | As for the training of each ES variant on each game, 20 CPUs are used with a time budget of 20 hours. |
| Software Dependencies | No | The paper mentions software environments and frameworks like Open AI Gym and DQN but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | we directly use a simple neural network with two hidden layers (one with 30 units and one with 20 units) to act as the network to be trained by ES. Virtual batch normalization is utilized as suggested in [Salimans et al., 2017]. The population size λ is set as λ = 20. Besides, the initial value µ0 for the proposed R-R1-ES is set as µ0 = 10. The maximum number of generations T is set as follows: T = 100 for Cart Pole-v1 and Pendulum-v0, T = 10000 for Mountain Car-v0. The population size λ and the initial value µ0 are set to λ = 798 and µ0 = 50, respectively. As for the other parameters used in each ES variant, we follow the suggestions given by their developers, and keep their settings the same as in their original literature. |