Quality-Similar Diversity via Population Based Reinforcement Learning

Authors: Shuang Wu, Jian Yao, Haobo Fu, Ye Tian, Chao Qian, Yaodong Yang, QIANG FU, Yang Wei

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive results on Mu Jo Co and Atari demonstrate that our algorithm significantly outperforms previous methods...The effectiveness of QSD-PBT is validated on both Mu Jo Co (Brockman et al., 2016) continuous control tasks and Atari games (Bellemare et al., 2013) with discrete action spaces.
Researcher Affiliation Collaboration 1Tencent AI Lab, Shenzhen, China 2State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 3Peking University, Beijing, China
Pseudocode Yes The pseudocode is given in Appendix C.5, and the code is open-sourced. Algorithm 1: QSD-PBT with TD3 or PPO baseline
Open Source Code Yes The pseudocode is given in Appendix C.5, and the code is open-sourced.
Open Datasets Yes The effectiveness of QSD-PBT is validated on both Mu Jo Co (Brockman et al., 2016) continuous control tasks and Atari games (Bellemare et al., 2013) with discrete action spaces.
Dataset Splits No The paper mentions partitioning the obtained quality range into M=10 disjoint intervals for evaluation, and how policies are saved 'within each quality interval during training'. However, it does not provide specific train/validation/test dataset splits with percentages, sample counts, or references to predefined splits for reproducibility.
Hardware Specification No The paper mentions 'Number of learners 8 (GPUs)' in Table 10 for Atari games, but it does not specify the model or type of GPUs or any other specific hardware components used for the experiments.
Software Dependencies No The paper mentions using specific algorithms and architectures like 'Optimizer Adam', 'PPO', 'TD3', and 'DQN', and 'LSTM as a feature exactor'. However, it does not provide specific version numbers for any software libraries or frameworks used (e.g., PyTorch 1.x, TensorFlow 2.x, or a specific version of a Reinforcement Learning framework).
Experiment Setup Yes All the hyperparameters for each method are listed in Table 9 except for EDO-CS and QD-PG, for which we implement with the architecture and hyperparameters suggested in their papers. All the hyperparameters for each method are listed in Table 10.