Sample-Efficient Quality-Diversity by Cooperative Coevolution
Authors: Ke Xue, Ren-Jian Wang, Pengyi Li, Dong Li, Jianye HAO, Chao Qian
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on several popular tasks within the QDax suite demonstrate that an instantiation of CCQD achieves approximately a 200% improvement in sample efficiency. Our code is available at https://github.com/lamda-bbo/CCQD. |
| Researcher Affiliation | Collaboration | 1 National Key Laboratory for Novel Software Technology, Nanjing University, China 2 School of Artificial Intelligence, Nanjing University, China 3 School of Computing and Intelligence, Tianjin University, China 4 Huawei Noah s Ark Lab, China |
| Pseudocode | Yes | Algorithm 1 MAP-Elites, Algorithm 2 Survivor Selection of ME, Algorithm 3 CCQD |
| Open Source Code | Yes | Our code is available at https://github.com/lamda-bbo/CCQD. |
| Open Datasets | Yes | To examine the performance of CCQD, we conduct experiments on the popular QDax suite2 (Lim et al., 2023a; Chalumeau et al., 2023b), including unidirectional, omnidirectional, and maze-type environments. We also demonstrate the versatility of CCQD by the experiments on Atari Pong (Bellemare et al., 2013). |
| Dataset Splits | No | The paper describes experiments conducted in reinforcement learning environments (QDax suite, Atari Pong). These environments are used for policy training and evaluation through interaction. However, the paper does not specify traditional training/validation/test *dataset splits* with explicit percentages or sample counts, as would be typical for static datasets in supervised learning. |
| Hardware Specification | Yes | The experiments are conducted on an NVIDIA RTX 3090 GPU (24 GB) with an AMD Ryzen 9 3950X CPU (16 Cores), except for PBT-ME, which is conducted on an NVIDIA RTX A6000 GPU (48 GB) with an AMD EPYC 7763 CPU (64 Cores). |
| Software Dependencies | No | The paper mentions conducting experiments on 'QDax' (citing Lim et al., 2023a; Chalumeau et al., 2023b) and that it is 'based on JAX'. It provides GitHub links for QDax and JAX in footnotes. However, it does not specify exact version numbers for QDax, JAX, Python, or any other critical software libraries used to replicate the experiments. |
| Experiment Setup | Yes | We represent a policy as a fully connected neural network with two 256-dimensional hidden layers... The number of cells in the archive is 1024, and the number of generated offspring solutions in each generation is 100... Table 2 provides detailed TD3 hyperparameters including Policy learning rate 1e-3, Critic learning rate 3e-4, Replay buffer size 1e6, Training batch size 256, and Discount 0.99. |