Quality-Similar Diversity via Population Based Reinforcement Learning
Authors: Shuang Wu, Jian Yao, Haobo Fu, Ye Tian, Chao Qian, Yaodong Yang, QIANG FU, Yang Wei
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive results on Mu Jo Co and Atari demonstrate that our algorithm significantly outperforms previous methods...The effectiveness of QSD-PBT is validated on both Mu Jo Co (Brockman et al., 2016) continuous control tasks and Atari games (Bellemare et al., 2013) with discrete action spaces. |
| Researcher Affiliation | Collaboration | 1Tencent AI Lab, Shenzhen, China 2State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 3Peking University, Beijing, China |
| Pseudocode | Yes | The pseudocode is given in Appendix C.5, and the code is open-sourced. Algorithm 1: QSD-PBT with TD3 or PPO baseline |
| Open Source Code | Yes | The pseudocode is given in Appendix C.5, and the code is open-sourced. |
| Open Datasets | Yes | The effectiveness of QSD-PBT is validated on both Mu Jo Co (Brockman et al., 2016) continuous control tasks and Atari games (Bellemare et al., 2013) with discrete action spaces. |
| Dataset Splits | No | The paper mentions partitioning the obtained quality range into M=10 disjoint intervals for evaluation, and how policies are saved 'within each quality interval during training'. However, it does not provide specific train/validation/test dataset splits with percentages, sample counts, or references to predefined splits for reproducibility. |
| Hardware Specification | No | The paper mentions 'Number of learners 8 (GPUs)' in Table 10 for Atari games, but it does not specify the model or type of GPUs or any other specific hardware components used for the experiments. |
| Software Dependencies | No | The paper mentions using specific algorithms and architectures like 'Optimizer Adam', 'PPO', 'TD3', and 'DQN', and 'LSTM as a feature exactor'. However, it does not provide specific version numbers for any software libraries or frameworks used (e.g., PyTorch 1.x, TensorFlow 2.x, or a specific version of a Reinforcement Learning framework). |
| Experiment Setup | Yes | All the hyperparameters for each method are listed in Table 9 except for EDO-CS and QD-PG, for which we implement with the architecture and hyperparameters suggested in their papers. All the hyperparameters for each method are listed in Table 10. |