Sample-Efficient Quality-Diversity by Cooperative Coevolution

Authors: Ke Xue, Ren-Jian Wang, Pengyi Li, Dong Li, Jianye HAO, Chao Qian

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on several popular tasks within the QDax suite demonstrate that an instantiation of CCQD achieves approximately a 200% improvement in sample efficiency. Our code is available at https://github.com/lamda-bbo/CCQD.
Researcher Affiliation Collaboration 1 National Key Laboratory for Novel Software Technology, Nanjing University, China 2 School of Artificial Intelligence, Nanjing University, China 3 School of Computing and Intelligence, Tianjin University, China 4 Huawei Noah s Ark Lab, China
Pseudocode Yes Algorithm 1 MAP-Elites, Algorithm 2 Survivor Selection of ME, Algorithm 3 CCQD
Open Source Code Yes Our code is available at https://github.com/lamda-bbo/CCQD.
Open Datasets Yes To examine the performance of CCQD, we conduct experiments on the popular QDax suite2 (Lim et al., 2023a; Chalumeau et al., 2023b), including unidirectional, omnidirectional, and maze-type environments. We also demonstrate the versatility of CCQD by the experiments on Atari Pong (Bellemare et al., 2013).
Dataset Splits No The paper describes experiments conducted in reinforcement learning environments (QDax suite, Atari Pong). These environments are used for policy training and evaluation through interaction. However, the paper does not specify traditional training/validation/test *dataset splits* with explicit percentages or sample counts, as would be typical for static datasets in supervised learning.
Hardware Specification Yes The experiments are conducted on an NVIDIA RTX 3090 GPU (24 GB) with an AMD Ryzen 9 3950X CPU (16 Cores), except for PBT-ME, which is conducted on an NVIDIA RTX A6000 GPU (48 GB) with an AMD EPYC 7763 CPU (64 Cores).
Software Dependencies No The paper mentions conducting experiments on 'QDax' (citing Lim et al., 2023a; Chalumeau et al., 2023b) and that it is 'based on JAX'. It provides GitHub links for QDax and JAX in footnotes. However, it does not specify exact version numbers for QDax, JAX, Python, or any other critical software libraries used to replicate the experiments.
Experiment Setup Yes We represent a policy as a fully connected neural network with two 256-dimensional hidden layers... The number of cells in the archive is 1024, and the number of generated offspring solutions in each generation is 100... Table 2 provides detailed TD3 hyperparameters including Policy learning rate 1e-3, Critic learning rate 3e-4, Replay buffer size 1e6, Training batch size 256, and Discount 0.99.