Warm Starting CMA-ES for Hyperparameter Optimization
Authors: Masahiro Nomura, Shuhei Watanabe, Youhei Akimoto, Yoshihiko Ozaki, Masaki Onishi9188-9196
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed warm starting CMA-ES, called WS-CMA-ES, is applied to different HPO tasks where some prior knowledge is available, showing its superior performance over the original CMA-ES as well as BO approaches with or without using the prior knowledge. In this study, we performed experiments on synthetic and HPO problems for several warm starting settings. |
| Researcher Affiliation | Collaboration | Masahiro Nomura* 1,2, Shuhei Watanabe* 3, Youhei Akimoto4,5, Yoshihiko Ozaki2,6, Masaki Onishi2 1Cyber Agent, Inc. 2Artificial Intelligence Research Center, AIST. 3University of Freiburg. 4University of Tsukuba. 5RIKEN Center for Advanced Intelligence Project. 6GREE, Inc. |
| Pseudocode | No | The paper describes its algorithms and methods using prose and mathematical equations but does not include any distinct pseudocode blocks or sections labeled “Algorithm” or “Pseudocode.” |
| Open Source Code | No | The paper does not provide any explicit statements about releasing the source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | We used the Toxic Comment Classification Challenge data (https://www.kaggle.com/c/jigsaw-toxic-commentclassification-challenge) as a dataset. We used the MNIST handwritten digits dataset (Le Cun et al. 1998) and the Fashion-MNIST clothing articles dataset (Xiao, Rasul, and Vollgraf 2017). The CNNs were trained on the CIFAR-100 dataset (Krizhevsky 2009). CNNs initially learned the Street View House Numbers (SVHN) dataset (Netzer et al. 2011). |
| Dataset Splits | No | The paper mentions “validation error” as a general objective in HPO but does not provide specific details on how the data was split for validation, or what portion of the dataset constituted the validation set for their experiments. |
| Hardware Specification | No | The paper states: “Computational resource of AI Bridging Cloud Infrastructure (ABCI) provided by National Institute of Advanced Industrial Science and Technology (AIST) was used.” This mentions a computing resource by name but does not provide specific hardware details such as GPU/CPU models, memory, or processor types. |
| Software Dependencies | No | The paper mentions software like “Light GBM” but does not provide specific version numbers for any software components, libraries, or programming languages used in the experiments. |
| Experiment Setup | Yes | In all the experiments, α and γ in WS-CMA-ES were set to 0.1 for each. We evaluated 100 hyperparameter settings by RS as prior knowledge in all the experiments to allow every method to transfer the same data fairly. Each optimization was run 12 times. Details of the experimental settings are shown in Appendix. Six hyperparameters shown in Appendix were optimized in the experiments. We optimized eight hyperparameters as shown in Appendix. The CNNs were trained on the CIFAR-100 dataset (Krizhevsky 2009) and have ten types of hyperparameters as described in Appendix. |