Global Optimization with Parametric Function Approximation
Authors: Chong Liu, Yu-Xiang Wang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Synthetic and real-world experiments illustrate GO-UCB works better than popular Bayesian optimization approaches, even if the model is misspecified. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of California, Santa Barbara, CA 93106, USA. |
| Pseudocode | Yes | Algorithm 1 GO-UCB |
| Open Source Code | No | All implementations are based on BoTorch framework (Balandat et al., 2020) and sklearn package (Head et al., 2021) with default parameter settings. |
| Open Datasets | Yes | Three UCI datasets (Dua & Graff, 2017) are Breat-cancer, Australian, and Diabetes |
| Dataset Splits | Yes | To reduce the effect of randomness, we divide each dataset into 5 folds and every time use 4 folds for training and remaining 1 fold for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | All implementations are based on BoTorch framework (Balandat et al., 2020) and sklearn package (Head et al., 2021) with default parameter settings. |
| Experiment Setup | Yes | To run GO-UCB, we choose our parametric function model ˆf to be a two linear layer neural network with sigmoid function being the activation function: ˆf(x) = linear2(sigmoid(linear1(x))), where w1, b1 denote the weight and bias of linear1 layer and w2, b2 denote those of linear2 layer. Specifically, we set w1 R25 dx, b1 R25, w2 R25, b2 R, meaning the dimension of activation function is 25. ... Noise parameter σ = 0.01. Regression oracle in GO-UCB is approximated by stochastic gradient descent algorithm on our two linear layer neural network model with mean squared error loss, 2000 iterations and 10−11 learning rate. ... we use iterative gradient ascent algorithm over x and w with 2000 iterations and 10−4 learning rate. ... We set n = 5, T = 25 for f1 and n = 8, T = 64 for f2, f3. |