Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Restart-based Rank-1 Evolution Strategy for Reinforcement Learning

Authors: Zefeng Chen, Yuren Zhou, Xiao-yu He, Siyu Jiang

IJCAI 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on classic control problems and Atari games show that the proposed algorithm is superior to or competitive with state-of-the-art algorithms for reinforcement learning, demonstrating the effectiveness of the proposed algorithm.
Researcher Affiliation	Academia	1School of Data and Computer Science, Sun Yat-sen University 2Engineering Research Institute, Guangzhou College of South China University of Technology 3School of Software Engineering, South China University of Technology
Pseudocode	Yes	Algorithm 1 R-R1-ES
Open Source Code	No	The paper does not provide any statement or link regarding the availability of its source code.
Open Datasets	Yes	Open AI Gym [Brockman et al., 2016], including Cart Pole-v1, Mountain Car-v0 and Pendulum-v0. four Atari games (i.e., Breakout-v0, Pongv0, Qbert-v0 and Seaquest-v0)
Dataset Splits	No	The paper mentions '21 independent training runs' and '30 independent evaluation runs' but does not specify dataset split percentages or sample counts for train/validation/test sets.
Hardware Specification	No	As for the training of each ES variant on each game, 20 CPUs are used with a time budget of 20 hours.
Software Dependencies	No	The paper mentions software environments and frameworks like Open AI Gym and DQN but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	we directly use a simple neural network with two hidden layers (one with 30 units and one with 20 units) to act as the network to be trained by ES. Virtual batch normalization is utilized as suggested in [Salimans et al., 2017]. The population size λ is set as λ = 20. Besides, the initial value µ0 for the proposed R-R1-ES is set as µ0 = 10. The maximum number of generations T is set as follows: T = 100 for Cart Pole-v1 and Pendulum-v0, T = 10000 for Mountain Car-v0. The population size λ and the initial value µ0 are set to λ = 798 and µ0 = 50, respectively. As for the other parameters used in each ES variant, we follow the suggestions given by their developers, and keep their settings the same as in their original literature.