An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy Search

Authors: Kyunghyun Lee, Byeong-Uk Lee, Ukcheol Shin, In So Kweon

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed framework and update methods are evaluated in continuous control benchmark work, showing superior performance as well as time efficiency compared to the previous methods. and We evaluate the algorithms in several simulated environments which are commonly used as benchmarks in policy search: Half Cheetah-v2, Hopper-v2, Walker2d-v2, Swimmer-v2, Ant-v2, and Humanoid-v2 [36]. The presented statistics were calculated and averaged over 10 runs with the same configuration.
Researcher Affiliation Academia Kyunghyun Lee Byeong-Uk Lee Ukcheol Shin In So Kweon Korea Advanced Institute of Science and Technology (KAIST) Daejeon, Korea {kyunghyun.lee, byeonguk.lee, shinwc159, iskweon77}@kaist.ac.kr
Pseudocode Yes A pseudocode for the whole algorithm is described in Appendix B. and Our overall algorithm pseudo-code is prested in Appendix B
Open Source Code Yes The source code of our implementation is available at https://github.com/KyunghyunLee/aes-rl
Open Datasets Yes We evaluate the algorithms in several simulated environments which are commonly used as benchmarks in policy search: Half Cheetah-v2, Hopper-v2, Walker2d-v2, Swimmer-v2, Ant-v2, and Humanoid-v2 [36].
Dataset Splits No The paper mentions evaluating algorithms in various simulated environments, but it does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages or counts).
Hardware Specification Yes All three algorithms are evaluated in Half CHeetah-v2, which has the fixed episode steps, and Walker2d-v2 and Hopper-v2, which has varying episode steps, in the same hardware configuration, two Ethernet-connected machines of Intel i7-6800k and three NVidia Ge Force 1080Ti; a total of 24 CPU cores and 6 GPUs.
Software Dependencies No The paper mentions using TD3 [5] in the RL part, but it does not provide specific version numbers for any software components, libraries, or frameworks used in the experiments.
Experiment Setup Yes Detailed architecture and hyperparameters for all methods are shown in Appendix A and C, respectively.