Quality-Diversity with Limited Resources
Authors: Ren-Jian Wang, Ke Xue, Cong Guan, Chao Qian
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on different types of tasks from small to large resource consumption demonstrate the excellent performance of Ref QD: it not only uses significantly fewer resources (e.g., 16% GPU memories on QDax and 3.7% on Atari) but also achieves comparable or better performance compared to sample-efficient QD algorithms. Our contributions are shown as follows. Experimental results with limited resources demonstrate the effectiveness of Ref QD, which utilizes only 3.7% to 16% GPU memories and achieves comparable or even superior QD metrics, including QD-Score, Coverage, and Max Fitness, with nearly the same wallclock time and number of samples. |
| Researcher Affiliation | Academia | 1National Key Laboratory for Novel Software Technology, Nanjing University, China 2School of Artificial Intelligence, Nanjing University, China. |
| Pseudocode | Yes | Algorithm 1 Resource-efficient QD. Algorithm 2 DDA Selection. |
| Open Source Code | Yes | Our code is available at https: //github.com/lamda-bbo/Ref QD. |
| Open Datasets | Yes | To examine the performance of Ref QD, we conduct experiments on the QDax suite and Atari environments. QDax1. It is a popular framework and benchmark for QD algorithms (Chalumeau et al., 2023b). To further investigate the versatile of Ref QD, we also conduct experiments on the video game Atari (Bellemare et al., 2013), which is a widely used benchmark in RL. |
| Dataset Splits | No | The paper mentions training models and parameters (e.g., 'Training batch size 32', 'Policy training steps 100'), but it does not specify explicit dataset splits for training, validation, or testing. Evaluation is based on performance in the environments rather than on a fixed validation dataset. |
| Hardware Specification | Yes | All the experiments are conducted on an NVIDIA RTX 3090 GPU (24 GB) with an AMD Ryzen 9 3950X CPU (16 Cores). |
| Software Dependencies | No | The paper mentions specific frameworks and algorithms (e.g., 'QDax suite', 'PGA-ME', 'DQN-ME', 'TD3') but does not provide specific version numbers for software dependencies such as Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | We employ the policy gradient method TD3, whose hyperparameters are presented in Table 5. Table 5. The hyperparameters of TD3. Hyperparameter Value Critic hidden layer size [256, 256] Policy learning rate 1e-3 Critic learning rate 3e-4 Replay buffer size 1e6 Training batch size 32 Policy training steps 100 Critic training steps 300 Reward scaling 1.0 Discount 0.99 Policy noise 0.2 Policy clip 0.5. The proposed Ref QD method is general, which can be implemented with different operators of advanced sample-efficient QD algorithms. In the experiments, we provide an instantiation of Ref QD using uniform parent selection and variation operators from the well-known PGA-ME (Nilsson & Cully, 2021; Flageat et al., 2023). |