reproducibilityindex.ai

Bias-Robust Bayesian Optimization via Dueling Bandits

Authors: Johannes Kirschner, Andreas Krause

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed method with the one-point reduction (IDS-one) and the two-point reduction (IDS-two) in two numerical experiments with confounded observations. To allow a fair comparison with the two-sample scheme, we account for the regret of both evaluations and scale the x-axis appropriately.
Researcher Affiliation	Academia	1Department of Computer Science, ETH Zurich.
Pseudocode	Yes	Algorithm 1: Approx. IDS for Dueling Feedback
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets	No	The paper describes synthetic environments like a 'linear reward function' and the 'camelback function' from which data is generated or sampled. It does not refer to or provide access information for a pre-existing publicly available dataset.
Dataset Splits	No	The paper does not explicitly provide specific training/test/validation dataset splits. It mentions discretizing an input space for the camelback function ('discretize the input space using 30 points per dimension') but not a formal data split strategy.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies used in the experiments.
Experiment Setup	Yes	We set the required concentration coefﬁcient βDR t,δ to βDR t,δ = q d log(1 + t/d) + 2 log t where we drop (conservative) constants required for the theoretical results in favor of better empirical performance. We compute the sampling distribution by solving the saddle point problem stated in (Krishnamurthy et al., 2018, Appendix D) using exponentiated gradient descent. In all experiments we set conﬁdence level δ = 0.05. In the ﬁrst experiment, we use a linear reward function f(x) = x, θ . For each repetition we sample k = 20 actions uniformly on the d = 4 dimensional unit sphere. We add Gaussian observation noise with variance σ2 = 1, that is ϵt N(0, 1) in (1). Our second experiment is in the non-linear, kernelized setting with observation noise variance σ2 = 0.1. We discretize the input space using 30 points per dimension. For both algorithms, we use an RBF kernel with lengthscale 0.2 and regularizer λ = 1, and set βn,δ = 1 in favor of better empirical performance.