Efficient Hyper-parameter Optimization with Cubic Regularization
Authors: Zhenqian Shen, Hansi Yang, Yong Li, James Kwok, Quanming Yao
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on synthetic and real-world data demonstrate the effectiveness of our proposed method. |
| Researcher Affiliation | Academia | 1Department of Electronic Engineering, Tsinghua University, Beijing, China 2Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong SAR, China |
| Pseudocode | Yes | Algorithm 1 Hyper-parameter optimization with cubic regularization. |
| Open Source Code | No | The paper does not include an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We use CIFAR-10 dataset for experiments with 50k, 5k, 5k image samples as training, validation, test data, respectively.Two well-known knowledge graph datasets, FB15k237 [38] and WN18RR [39], are used in experiments and their statistics are in Appendix E.2. |
| Dataset Splits | Yes | We use CIFAR-10 dataset for experiments with 50k, 5k, 5k image samples as training, validation, test data, respectively. |
| Hardware Specification | Yes | Experiments are conducted on a 24GB NVIDIA Ge Force RTX 3090 GPU. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers for reproducibility. |
| Experiment Setup | Yes | For the mask of i-th dimension zi, we use sigmoid function to represent the probability to mask that dimension, i.e., p i(zi = 1) = 1/(1+exp( i)) and p i(zi = 0) = 1 p i(zi = 1). As all hyper-parameters considered in this application are discrete, we choose softmax-liked distributions to represent the probability to select a specific value for each hyper-parameter. The hyper-parameter z is divided into two parts: z ( , {βi}), and Rz(t) is parameterized as follows: Rz(t) PI i=1 iri(t; βi) |