Semiparametric Contextual Bandits

Authors: Akshay Krishnamurthy, Zhiwei Steven Wu, Vasilis Syrgkanis

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also compare our algorithm to approaches from both parametric and agnostic families in an empirical study (we use a linear policy class for agnostic approaches). In Section 5, we evaluate several algorithms on synthetic problems where the reward is (a) linear, and (b) linear with confounding. In the linear case, our approach learns, but is slightly worse than the baselines. On the other hand, when there is confounding, our algorithm significantly outperforms both parametric and agnostic approaches. As such, these experiments demonstrate that our algorithm represents a favorable trade off between statistical efficiency and robustness.
Researcher Affiliation Industry 1Microsoft Research, New York, New York 2Microsoft Research, Cambridge, Massachusetts.
Pseudocode Yes Algorithm 1: BOSE (Bandit orthogonalized semiparametric estimation)
Open Source Code Yes Our code is publicly available at http://github.com/akshaykr/oracle_cb/.
Open Datasets No We simulate three different environments that follow the semiparametric contextual bandits model with d = 10, K = 2. In the first setting the reward is linear and the action features are drawn uniformly from the unit sphere. In the latter two settings, we set ft(xt) = maxah , zt,ai, which is related to the construction in the proof of Proposition 3. One of these semiparametric settings has action features sampled from the unit sphere, while for the other, we sample from the intersection of the unit sphere and the positive orthant. The paper uses simulated data and does not provide access information (link, DOI, citation) to a publicly available dataset.
Dataset Splits No The paper describes generating synthetic data for simulations but does not specify dataset splits (e.g., training, validation, test percentages or counts) as would be typical for experiments on fixed datasets.
Hardware Specification No The paper mentions running experiments but does not provide specific details about the hardware used (e.g., CPU/GPU models, memory, or cloud instances).
Software Dependencies No The paper does not provide specific details about ancillary software dependencies, such as library names with version numbers.
Experiment Setup Yes Set λ 4d log(9T) + 8 log(4T/δ) and γ(T) 27d log(1 + 2T/d) + 54 log(4T/δ).