Value-Evolutionary-Based Reinforcement Learning

Authors: Pengyi Li, Jianye Hao, Hongyao Tang, Yan Zheng, Fazl Barez

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Min Atar and Atari demonstrate the superiority of VEB-RL in significantly improving DQN, Rainbow, and SPR. This section empirically evaluates VEB-RL on a range of tasks.
Researcher Affiliation Academia 1College of Intelligence and Computing, Tianjin University, China 2Edinburgh Centre for Robotics 3University of Oxford 4Centre for the Study of Existential Risk, University of Cambridge
Pseudocode Yes Algorithm 1 Value-Evolutionary-Based RL
Open Source Code Yes Our code is available on https://github.com/yeshenpy/VEB-RL.
Open Datasets Yes We first consider Min Atar benchmark (Young & Tian, 2019) which is a testbed of miniaturized versions of several Atari games. We further verify whether CEM-VEB-RL and GA-VEB-RL can further improve Rainbow on six popular tasks of Atari: Breakout, Space Invaders, Qbert, Pong, Battle Zone and Name This Game, in which agents need to take the high-dimensional pixel images as inputs.
Dataset Splits No The paper mentions "training process" and "training steps" but does not provide specific percentages or counts for train/validation/test splits, nor does it reference predefined splits with citations for reproducibility.
Hardware Specification Yes All experiments are carried out on NVIDIA GTX 2080 Ti GPU with Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz.
Software Dependencies No The paper mentions "Optimizer Adam" and refers to implementations from ERL and CEM-RL (with GitHub links), but it does not specify software dependencies like Python, PyTorch, or TensorFlow with their explicit version numbers.
Experiment Setup Yes Table 3. Details of setting. Parameter Value Optimizer Adam Learning rate 3e-4 Replay buffer size 1e5 Number of hidden layers for Q network 2 Number of hidden units per layer 1024, 128 Batch size 32 Number of the convolutional layer 1 Out channels of the convolutional layer 16 Kernel size of the convolutional layer 3 3 The stride of the convolutional layer 1 Discounted factor γ 0.99 Steps to update the target network 1000 Sample size for calculating the fitness N 5120 in Min Atar & 1024 in Atari Population size 10 Update frequency of target network in the population H 20