Value-Evolutionary-Based Reinforcement Learning
Authors: Pengyi Li, Jianye Hao, Hongyao Tang, Yan Zheng, Fazl Barez
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Min Atar and Atari demonstrate the superiority of VEB-RL in significantly improving DQN, Rainbow, and SPR. This section empirically evaluates VEB-RL on a range of tasks. |
| Researcher Affiliation | Academia | 1College of Intelligence and Computing, Tianjin University, China 2Edinburgh Centre for Robotics 3University of Oxford 4Centre for the Study of Existential Risk, University of Cambridge |
| Pseudocode | Yes | Algorithm 1 Value-Evolutionary-Based RL |
| Open Source Code | Yes | Our code is available on https://github.com/yeshenpy/VEB-RL. |
| Open Datasets | Yes | We first consider Min Atar benchmark (Young & Tian, 2019) which is a testbed of miniaturized versions of several Atari games. We further verify whether CEM-VEB-RL and GA-VEB-RL can further improve Rainbow on six popular tasks of Atari: Breakout, Space Invaders, Qbert, Pong, Battle Zone and Name This Game, in which agents need to take the high-dimensional pixel images as inputs. |
| Dataset Splits | No | The paper mentions "training process" and "training steps" but does not provide specific percentages or counts for train/validation/test splits, nor does it reference predefined splits with citations for reproducibility. |
| Hardware Specification | Yes | All experiments are carried out on NVIDIA GTX 2080 Ti GPU with Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz. |
| Software Dependencies | No | The paper mentions "Optimizer Adam" and refers to implementations from ERL and CEM-RL (with GitHub links), but it does not specify software dependencies like Python, PyTorch, or TensorFlow with their explicit version numbers. |
| Experiment Setup | Yes | Table 3. Details of setting. Parameter Value Optimizer Adam Learning rate 3e-4 Replay buffer size 1e5 Number of hidden layers for Q network 2 Number of hidden units per layer 1024, 128 Batch size 32 Number of the convolutional layer 1 Out channels of the convolutional layer 16 Kernel size of the convolutional layer 3 3 The stride of the convolutional layer 1 Discounted factor γ 0.99 Steps to update the target network 1000 Sample size for calculating the fitness N 5120 in Min Atar & 1024 in Atari Population size 10 Update frequency of target network in the population H 20 |