Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Evolution-Guided Policy Gradient in Reinforcement Learning
Authors: Shauharda Khadka, Kagan Tumer
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments in a range of challenging continuous control benchmarks demonstrate that ERL significantly outperforms prior DRL and EA methods. |
| Researcher Affiliation | Academia | Shauharda Khadka Kagan Tumer Collaborative Robotics and Intelligent Systems Institute Oregon State University EMAIL |
| Pseudocode | Yes | Algorithm 1, 2 and 3 provide a detailed pseudocode of the ERL algorithm using DDPG as its policy gradient component. |
| Open Source Code | Yes | Code available at https://github.com/Shaw K91/erl_paper_nips18 |
| Open Datasets | Yes | We evaluated the performance of ERL1 agents on 6 continuous control tasks simulated using Mujoco [56]. These are benchmarks used widely in the field [13, 25, 53, 47] and are hosted through the Open AI gym [6]. |
| Dataset Splits | No | The paper does not specify explicit dataset split percentages (e.g., 80/10/10) or absolute sample counts for training, validation, and test sets. It mentions using well-known environments for evaluation but not how data within these environments is formally partitioned into these splits. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments. |
| Software Dependencies | No | ERL is implemented using Py Torch [39] while Open AI Baselines [11] was used to implement PPO and DDPG. While software names are mentioned, specific version numbers for PyTorch or OpenAI Baselines are not provided. |
| Experiment Setup | Yes | Adam [29] optimizer with gradient clipping at 10 and a learning rate of 5e 5 and 5e 4 was used for the rlactor and rlcritic, respectively. The size of the population k was set to 10, while the elite fraction ψ varied from 0.1 to 0.3 across tasks. The number of trials conducted to compute a fitness score, ξ ranged from 1 to 5 across tasks. The size of the replay buffer and batch size were set to 1e6 and 128, respectively. The discount rate γ and target weight τ were set to 0.99 and 1e 3, respectively. The mutation probability mutprob was set to 0.9 while the syncronization period ω ranged from 1 to 10 across tasks. The mutation strength mutstrength was set to 0.1 corresponding to a 10% Gaussian noise. Finally, the mutation fraction mutfrac was set to 0.1 while the probability from super mutation supermutprob and reset resetmutprob were set to 0.05. |