Bias-Robust Bayesian Optimization via Dueling Bandits
Authors: Johannes Kirschner, Andreas Krause
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed method with the one-point reduction (IDS-one) and the two-point reduction (IDS-two) in two numerical experiments with confounded observations. To allow a fair comparison with the two-sample scheme, we account for the regret of both evaluations and scale the x-axis appropriately. |
| Researcher Affiliation | Academia | 1Department of Computer Science, ETH Zurich. |
| Pseudocode | Yes | Algorithm 1: Approx. IDS for Dueling Feedback |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | No | The paper describes synthetic environments like a 'linear reward function' and the 'camelback function' from which data is generated or sampled. It does not refer to or provide access information for a pre-existing publicly available dataset. |
| Dataset Splits | No | The paper does not explicitly provide specific training/test/validation dataset splits. It mentions discretizing an input space for the camelback function ('discretize the input space using 30 points per dimension') but not a formal data split strategy. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies used in the experiments. |
| Experiment Setup | Yes | We set the required concentration coefficient βDR t,δ to βDR t,δ = q d log(1 + t/d) + 2 log t where we drop (conservative) constants required for the theoretical results in favor of better empirical performance. We compute the sampling distribution by solving the saddle point problem stated in (Krishnamurthy et al., 2018, Appendix D) using exponentiated gradient descent. In all experiments we set confidence level δ = 0.05. In the first experiment, we use a linear reward function f(x) = x, θ . For each repetition we sample k = 20 actions uniformly on the d = 4 dimensional unit sphere. We add Gaussian observation noise with variance σ2 = 1, that is ϵt N(0, 1) in (1). Our second experiment is in the non-linear, kernelized setting with observation noise variance σ2 = 0.1. We discretize the input space using 30 points per dimension. For both algorithms, we use an RBF kernel with lengthscale 0.2 and regularizer λ = 1, and set βn,δ = 1 in favor of better empirical performance. |