Quality-Diversity with Limited Resources

Authors: Ren-Jian Wang, Ke Xue, Cong Guan, Chao Qian

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on different types of tasks from small to large resource consumption demonstrate the excellent performance of Ref QD: it not only uses significantly fewer resources (e.g., 16% GPU memories on QDax and 3.7% on Atari) but also achieves comparable or better performance compared to sample-efficient QD algorithms. Our contributions are shown as follows. Experimental results with limited resources demonstrate the effectiveness of Ref QD, which utilizes only 3.7% to 16% GPU memories and achieves comparable or even superior QD metrics, including QD-Score, Coverage, and Max Fitness, with nearly the same wallclock time and number of samples.
Researcher Affiliation Academia 1National Key Laboratory for Novel Software Technology, Nanjing University, China 2School of Artificial Intelligence, Nanjing University, China.
Pseudocode Yes Algorithm 1 Resource-efficient QD. Algorithm 2 DDA Selection.
Open Source Code Yes Our code is available at https: //github.com/lamda-bbo/Ref QD.
Open Datasets Yes To examine the performance of Ref QD, we conduct experiments on the QDax suite and Atari environments. QDax1. It is a popular framework and benchmark for QD algorithms (Chalumeau et al., 2023b). To further investigate the versatile of Ref QD, we also conduct experiments on the video game Atari (Bellemare et al., 2013), which is a widely used benchmark in RL.
Dataset Splits No The paper mentions training models and parameters (e.g., 'Training batch size 32', 'Policy training steps 100'), but it does not specify explicit dataset splits for training, validation, or testing. Evaluation is based on performance in the environments rather than on a fixed validation dataset.
Hardware Specification Yes All the experiments are conducted on an NVIDIA RTX 3090 GPU (24 GB) with an AMD Ryzen 9 3950X CPU (16 Cores).
Software Dependencies No The paper mentions specific frameworks and algorithms (e.g., 'QDax suite', 'PGA-ME', 'DQN-ME', 'TD3') but does not provide specific version numbers for software dependencies such as Python, PyTorch, or other libraries.
Experiment Setup Yes We employ the policy gradient method TD3, whose hyperparameters are presented in Table 5. Table 5. The hyperparameters of TD3. Hyperparameter Value Critic hidden layer size [256, 256] Policy learning rate 1e-3 Critic learning rate 3e-4 Replay buffer size 1e6 Training batch size 32 Policy training steps 100 Critic training steps 300 Reward scaling 1.0 Discount 0.99 Policy noise 0.2 Policy clip 0.5. The proposed Ref QD method is general, which can be implemented with different operators of advanced sample-efficient QD algorithms. In the experiments, we provide an instantiation of Ref QD using uniform parent selection and variation operators from the well-known PGA-ME (Nilsson & Cully, 2021; Flageat et al., 2023).