reproducibilityindex.ai

Quality-Similar Diversity via Population Based Reinforcement Learning

Authors: Shuang Wu, Jian Yao, Haobo Fu, Ye Tian, Chao Qian, Yaodong Yang, QIANG FU, Yang Wei

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive results on Mu Jo Co and Atari demonstrate that our algorithm significantly outperforms previous methods...The effectiveness of QSD-PBT is validated on both Mu Jo Co (Brockman et al., 2016) continuous control tasks and Atari games (Bellemare et al., 2013) with discrete action spaces.
Researcher Affiliation	Collaboration	1Tencent AI Lab, Shenzhen, China 2State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 3Peking University, Beijing, China
Pseudocode	Yes	The pseudocode is given in Appendix C.5, and the code is open-sourced. Algorithm 1: QSD-PBT with TD3 or PPO baseline
Open Source Code	Yes	The pseudocode is given in Appendix C.5, and the code is open-sourced.
Open Datasets	Yes	The effectiveness of QSD-PBT is validated on both Mu Jo Co (Brockman et al., 2016) continuous control tasks and Atari games (Bellemare et al., 2013) with discrete action spaces.
Dataset Splits	No	The paper mentions partitioning the obtained quality range into M=10 disjoint intervals for evaluation, and how policies are saved 'within each quality interval during training'. However, it does not provide specific train/validation/test dataset splits with percentages, sample counts, or references to predefined splits for reproducibility.
Hardware Specification	No	The paper mentions 'Number of learners 8 (GPUs)' in Table 10 for Atari games, but it does not specify the model or type of GPUs or any other specific hardware components used for the experiments.
Software Dependencies	No	The paper mentions using specific algorithms and architectures like 'Optimizer Adam', 'PPO', 'TD3', and 'DQN', and 'LSTM as a feature exactor'. However, it does not provide specific version numbers for any software libraries or frameworks used (e.g., PyTorch 1.x, TensorFlow 2.x, or a specific version of a Reinforcement Learning framework).
Experiment Setup	Yes	All the hyperparameters for each method are listed in Table 9 except for EDO-CS and QD-PG, for which we implement with the architecture and hyperparameters suggested in their papers. All the hyperparameters for each method are listed in Table 10.