Genetic-Gated Networks for Deep Reinforcement Learning

Authors: Simyung Chang, John Yang, Jaeseok Choi, Nojun Kwak

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that this G2N can be applied to typical reinforcement learning algorithms to achieve a large improvement in sample efficiency and performance.4 Experiments We have experimented the Atari environment and Mu Jo Co [18], a representative RL problems, to verify the followings: (1) Can G2N be effectively applied to Actor-Critic, a typical RL algorithm? (2) Does the genetic algorithm of Genetic-Gated RL models have advantages over a simple multi-policy model? (3) Are Genetic-Gated RL algorithms effective in terms of sample efficiency and computation? All the experiments were performed using Open AI gym [1].
Researcher Affiliation Collaboration Simyung Chang Seoul National University, Samsung Electronics Seoul, Korea timelighter@snu.ac.kr John Yang Seoul National University Seoul, Korea yjohn@snu.ac.kr Jaeseok Choi Seoul National University Seoul, Korea jaeseok.choi@snu.ac.kr Nojun Kwak Seoul National University Seoul, Korea nojunk@snu.ac.kr
Pseudocode Yes Algorithm 1 A Genetic-Gated Actor Critic Pseudo-Code
Open Source Code No The paper does not contain an explicit statement or link indicating that the authors have released the source code for their methodology.
Open Datasets Yes We have experimented the Atari environment and Mu Jo Co [18], a representative RL problems, to verify the followings: (1) Can G2N be effectively applied to Actor-Critic, a typical RL algorithm? (2) Does the genetic algorithm of Genetic-Gated RL models have advantages over a simple multi-policy model? (3) Are Genetic-Gated RL algorithms effective in terms of sample efficiency and computation? All the experiments were performed using Open AI gym [1].
Dataset Splits No The paper describes training and evaluation on environments (Atari, Mu Jo Co) in terms of frames and episodes, but it does not specify explicit dataset splits (e.g., percentages or counts for training, validation, and testing sets) in the traditional supervised learning sense. While common in RL, this does not meet the criteria for a specified 'validation dataset split'.
Hardware Specification Yes DQN is measured with 1 GPU(K40), A3C with 16 core cpu, ES and Simpe GA with 720 CPUs, and G2AC with 1 GPU(Titan X).
Software Dependencies No The paper mentions using "Open AI gym [1]" but does not specify version numbers for this or any other software dependencies, which is required for reproducibility.
Experiment Setup Yes For the Atari experiments, we have adapted the same CNN architecture and hyper-parameters of A2C [21]. And at the beginning of training, each gene is activated with a probability of 80%. G2AC use 64 individuals (actors), with 80% of crossover probability and 3% of mutation probability for each genetic evolution. In every generation, the elite phase and the GA+elite phase are respectively set to persist 500 steps and 20 episodes for each actor. ... original PPO is reported to use a single actor model and a batch size of 2,048 steps (horizons) when learning in the Mu Jo Co environments. ... PPO8 can be trained synchronously in eight number of actor threads with the horizon size of 512 and reproduce most of PPO s performance in the corresponding paper. G2PPO thus has been experimented with the same settings as those of PPO8 for a fair comparison.