Practical Contextual Bandits with Regression Oracles

Authors: Dylan Foster, Alekh Agarwal, Miroslav Dudik, Haipeng Luo, Robert Schapire

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In an extensive empirical evaluation, we find that our approach typically matches or outperforms both realizability-based and agnostic baselines.
Researcher Affiliation Collaboration 1Cornell University. Work performed while the author was an intern at Microsoft Research. 2Microsoft Research 3University of Southern California.
Pseudocode Yes Algorithm 1 REGCB.ELIMINATION ... Algorithm 2 REGCB.OPTIMISTIC ... Algorithm 3 BINSEARCH
Open Source Code No The paper references an implementation for baselines ("We use an implementation available at https://github. com/akshaykr/oracle_cb"), but does not state that the code for their own proposed methods (Reg CB) is open-source or available.
Open Datasets Yes We use two large-scale learning-to-rank datasets, Microsoft MSLRWEB30k (mslr) (Qin & Liu, 2010) and Yahoo! Learning to Rank Challenge V2.0 (yahoo) (Chapelle & Chang, 2011)... We also use eight classification datasets from the UCI repository (Lichman, 2013).
Dataset Splits Yes Each dataset is split into training data , for which algorithm receives one example at a time and must predict online, and a holdout validation set.
Hardware Specification No The paper does not provide specific details about the hardware used for the experiments (e.g., CPU, GPU models, memory).
Software Dependencies No The paper mentions using specific software for baselines (e.g., scikit-learn implicitly, as it's cited generally), but does not list specific software dependencies with version numbers needed to replicate the experiments for their own methods.
Experiment Setup Yes Parameter Tuning: For -Greedy we tune the constant , and for ILTCB we tune a certain smoothing parameter (see Appendix B). For Algorithms 1 and 2 we set βm = β for all m and tune β. For Algorithm 2 we use a warm start of 0. We tune a confidence parameter similar to β for Bootstrap-TS.