Bias-Robust Bayesian Optimization via Dueling Bandits

Authors: Johannes Kirschner, Andreas Krause

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed method with the one-point reduction (IDS-one) and the two-point reduction (IDS-two) in two numerical experiments with confounded observations. To allow a fair comparison with the two-sample scheme, we account for the regret of both evaluations and scale the x-axis appropriately.
Researcher Affiliation Academia 1Department of Computer Science, ETH Zurich.
Pseudocode Yes Algorithm 1: Approx. IDS for Dueling Feedback
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets No The paper describes synthetic environments like a 'linear reward function' and the 'camelback function' from which data is generated or sampled. It does not refer to or provide access information for a pre-existing publicly available dataset.
Dataset Splits No The paper does not explicitly provide specific training/test/validation dataset splits. It mentions discretizing an input space for the camelback function ('discretize the input space using 30 points per dimension') but not a formal data split strategy.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies used in the experiments.
Experiment Setup Yes We set the required concentration coefficient βDR t,δ to βDR t,δ = q d log(1 + t/d) + 2 log t where we drop (conservative) constants required for the theoretical results in favor of better empirical performance. We compute the sampling distribution by solving the saddle point problem stated in (Krishnamurthy et al., 2018, Appendix D) using exponentiated gradient descent. In all experiments we set confidence level δ = 0.05. In the first experiment, we use a linear reward function f(x) = x, θ . For each repetition we sample k = 20 actions uniformly on the d = 4 dimensional unit sphere. We add Gaussian observation noise with variance σ2 = 1, that is ϵt N(0, 1) in (1). Our second experiment is in the non-linear, kernelized setting with observation noise variance σ2 = 0.1. We discretize the input space using 30 points per dimension. For both algorithms, we use an RBF kernel with lengthscale 0.2 and regularizer λ = 1, and set βn,δ = 1 in favor of better empirical performance.