reproducibilityindex.ai

Genetic-Gated Networks for Deep Reinforcement Learning

Authors: Simyung Chang, John Yang, Jaeseok Choi, Nojun Kwak

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that this G2N can be applied to typical reinforcement learning algorithms to achieve a large improvement in sample efﬁciency and performance.4 Experiments We have experimented the Atari environment and Mu Jo Co [18], a representative RL problems, to verify the followings: (1) Can G2N be effectively applied to Actor-Critic, a typical RL algorithm? (2) Does the genetic algorithm of Genetic-Gated RL models have advantages over a simple multi-policy model? (3) Are Genetic-Gated RL algorithms effective in terms of sample efﬁciency and computation? All the experiments were performed using Open AI gym [1].
Researcher Affiliation	Collaboration	Simyung Chang Seoul National University, Samsung Electronics Seoul, Korea timelighter@snu.ac.kr John Yang Seoul National University Seoul, Korea yjohn@snu.ac.kr Jaeseok Choi Seoul National University Seoul, Korea jaeseok.choi@snu.ac.kr Nojun Kwak Seoul National University Seoul, Korea nojunk@snu.ac.kr
Pseudocode	Yes	Algorithm 1 A Genetic-Gated Actor Critic Pseudo-Code
Open Source Code	No	The paper does not contain an explicit statement or link indicating that the authors have released the source code for their methodology.
Open Datasets	Yes	We have experimented the Atari environment and Mu Jo Co [18], a representative RL problems, to verify the followings: (1) Can G2N be effectively applied to Actor-Critic, a typical RL algorithm? (2) Does the genetic algorithm of Genetic-Gated RL models have advantages over a simple multi-policy model? (3) Are Genetic-Gated RL algorithms effective in terms of sample efﬁciency and computation? All the experiments were performed using Open AI gym [1].
Dataset Splits	No	The paper describes training and evaluation on environments (Atari, Mu Jo Co) in terms of frames and episodes, but it does not specify explicit dataset splits (e.g., percentages or counts for training, validation, and testing sets) in the traditional supervised learning sense. While common in RL, this does not meet the criteria for a specified 'validation dataset split'.
Hardware Specification	Yes	DQN is measured with 1 GPU(K40), A3C with 16 core cpu, ES and Simpe GA with 720 CPUs, and G2AC with 1 GPU(Titan X).
Software Dependencies	No	The paper mentions using "Open AI gym [1]" but does not specify version numbers for this or any other software dependencies, which is required for reproducibility.
Experiment Setup	Yes	For the Atari experiments, we have adapted the same CNN architecture and hyper-parameters of A2C [21]. And at the beginning of training, each gene is activated with a probability of 80%. G2AC use 64 individuals (actors), with 80% of crossover probability and 3% of mutation probability for each genetic evolution. In every generation, the elite phase and the GA+elite phase are respectively set to persist 500 steps and 20 episodes for each actor. ... original PPO is reported to use a single actor model and a batch size of 2,048 steps (horizons) when learning in the Mu Jo Co environments. ... PPO8 can be trained synchronously in eight number of actor threads with the horizon size of 512 and reproduce most of PPO s performance in the corresponding paper. G2PPO thus has been experimented with the same settings as those of PPO8 for a fair comparison.