ERL-TD: Evolutionary Reinforcement Learning Enhanced with Truncated Variance and Distillation Mutation
Authors: Qiuzhen Lin, Yangfan Chen, Lijia Ma, Wei-Neng Chen, Jianqiang Li
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate ERL-TD on the continuous control benchmarks from the Open AI Gym and Deep Mind Control Suite. The experiments show that ERLTD shows excellent performance and outperforms all baseline RL algorithms on the test suites. |
| Researcher Affiliation | Academia | Qiuzhen Lin1, Yangfan Chen1, Lijia Ma1, Wei-Neng Chen2, Jianqiang Li3* 1College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China 2School of Computer Science and Engineering, South China University of Technology, Guangzhou, China 3National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, China lijq@szu.edu.cn |
| Pseudocode | Yes | Algorithm 1: ERL-TD |
| Open Source Code | No | The paper does not explicitly state that source code for ERL-TD is provided, nor does it include a link to a code repository. |
| Open Datasets | Yes | We evaluate ERL-TD on several continuous control benchmarks, which include Open AI Gym (Brockman et al. 2016) and Deep Mind Control Suite (DMC) (Tassa et al. 2018). Open AI Gym For the Open AI Gym experiments with proprioceptive inputs (e.g., positions and velocities), we compare ERL-TD with several competitive RL algorithms, including ERL-Re2 (Li et al. 2022), PDERL (Bodnar, Day, and Li o 2020), CERL (Khadka et al. 2019), CEM-RL (Pourchot and Sigaud 2019), ERL (Khadka and Tumer 2018), SAC (Haarnoja et al. 2018), and TD3 (Fujimoto, Hoof, and Meger 2018). Specifically, ERL-Re2 is a state-of-the-art ERL variant. To compare their performance, we report their learning curves on six complex environments (Half Cheetahv2, Walker2d-v2, Hopper-v2, Ant-v2, Humanoid-v2, and Swimmer-v2) in Open AI Gym, each of which is run with five different seeds for 1000k steps. Deep Mind Control Suite (DMC) The Deep Mind Control Suite presents a great challenge due to its large dimension of pixel-input. To demonstrate the robustness of our algorithm, we integrate ERL-TD and Dr Q (Kostrikov, Yarats, and Fergus 2020), which are compared with some competitive RL algorithms, including Deep Planning Network (Pla Net) (Hafner et al. 2019b), Dreamer (Hafner et al. 2019a), Contrastive Unsupervised Representations for Reinforcement Learning (CURL) (Laskin, Srinivas, and Abbeel 2020), Reinforcement Learning with Augmented Data (RAD) (Laskin et al. 2020), Data-regularized Q (Dr Q) (Kostrikov, Yarats, and Fergus 2020), and SUNRISE (Lee et al. 2021). |
| Dataset Splits | No | The paper mentions training on Open AI Gym and Deep Mind Control Suite environments and refers to test suites, but does not explicitly describe validation dataset splits, percentages, or methodology. |
| Hardware Specification | Yes | Table 2: Time measurements (in seconds) of training 10000 frames, executed on the NVIDIA Geforce RTX 3090. |
| Software Dependencies | No | The paper mentions the use of software like Open AI Gym and Deep Mind Control Suite, and references various RL algorithms (e.g., SAC, Dr Q), but does not provide specific version numbers for any software dependencies or libraries used for implementation. |
| Experiment Setup | Yes | Open AI Gym For the Open AI Gym experiments with proprioceptive inputs (e.g., positions and velocities), we compare ERL-TD with several competitive RL algorithms... each of which is run with five different seeds for 1000k steps. In Figure 7(c), we vary the number of Q-networks K {2, 3, 4, 5}. The mean and standard deviation are on the same scale, so our intuition suggests that the value of α should be set around 0.5. The properties of the exponential function change when the base is equal to one. Therefore, we set different values for α when the base is greater than 1 and less than 1. The impact of α is demonstrated in Figure 9. We use α-x-y to describe all legends, where x and y represent the values of α when the base is greater and less than 1, respectively. |