reproducibilityindex.ai

Semiparametric Contextual Bandits

Authors: Akshay Krishnamurthy, Zhiwei Steven Wu, Vasilis Syrgkanis

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also compare our algorithm to approaches from both parametric and agnostic families in an empirical study (we use a linear policy class for agnostic approaches). In Section 5, we evaluate several algorithms on synthetic problems where the reward is (a) linear, and (b) linear with confounding. In the linear case, our approach learns, but is slightly worse than the baselines. On the other hand, when there is confounding, our algorithm signiﬁcantly outperforms both parametric and agnostic approaches. As such, these experiments demonstrate that our algorithm represents a favorable trade off between statistical efﬁciency and robustness.
Researcher Affiliation	Industry	1Microsoft Research, New York, New York 2Microsoft Research, Cambridge, Massachusetts.
Pseudocode	Yes	Algorithm 1: BOSE (Bandit orthogonalized semiparametric estimation)
Open Source Code	Yes	Our code is publicly available at http://github.com/akshaykr/oracle_cb/.
Open Datasets	No	We simulate three different environments that follow the semiparametric contextual bandits model with d = 10, K = 2. In the first setting the reward is linear and the action features are drawn uniformly from the unit sphere. In the latter two settings, we set ft(xt) = maxah , zt,ai, which is related to the construction in the proof of Proposition 3. One of these semiparametric settings has action features sampled from the unit sphere, while for the other, we sample from the intersection of the unit sphere and the positive orthant. The paper uses simulated data and does not provide access information (link, DOI, citation) to a publicly available dataset.
Dataset Splits	No	The paper describes generating synthetic data for simulations but does not specify dataset splits (e.g., training, validation, test percentages or counts) as would be typical for experiments on fixed datasets.
Hardware Specification	No	The paper mentions running experiments but does not provide specific details about the hardware used (e.g., CPU/GPU models, memory, or cloud instances).
Software Dependencies	No	The paper does not provide specific details about ancillary software dependencies, such as library names with version numbers.
Experiment Setup	Yes	Set λ 4d log(9T) + 8 log(4T/δ) and γ(T) 27d log(1 + 2T/d) + 54 log(4T/δ).