Practical Contextual Bandits with Regression Oracles
Authors: Dylan Foster, Alekh Agarwal, Miroslav Dudik, Haipeng Luo, Robert Schapire
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In an extensive empirical evaluation, we find that our approach typically matches or outperforms both realizability-based and agnostic baselines. |
| Researcher Affiliation | Collaboration | 1Cornell University. Work performed while the author was an intern at Microsoft Research. 2Microsoft Research 3University of Southern California. |
| Pseudocode | Yes | Algorithm 1 REGCB.ELIMINATION ... Algorithm 2 REGCB.OPTIMISTIC ... Algorithm 3 BINSEARCH |
| Open Source Code | No | The paper references an implementation for baselines ("We use an implementation available at https://github. com/akshaykr/oracle_cb"), but does not state that the code for their own proposed methods (Reg CB) is open-source or available. |
| Open Datasets | Yes | We use two large-scale learning-to-rank datasets, Microsoft MSLRWEB30k (mslr) (Qin & Liu, 2010) and Yahoo! Learning to Rank Challenge V2.0 (yahoo) (Chapelle & Chang, 2011)... We also use eight classification datasets from the UCI repository (Lichman, 2013). |
| Dataset Splits | Yes | Each dataset is split into training data , for which algorithm receives one example at a time and must predict online, and a holdout validation set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for the experiments (e.g., CPU, GPU models, memory). |
| Software Dependencies | No | The paper mentions using specific software for baselines (e.g., scikit-learn implicitly, as it's cited generally), but does not list specific software dependencies with version numbers needed to replicate the experiments for their own methods. |
| Experiment Setup | Yes | Parameter Tuning: For -Greedy we tune the constant , and for ILTCB we tune a certain smoothing parameter (see Appendix B). For Algorithms 1 and 2 we set βm = β for all m and tune β. For Algorithm 2 we use a warm start of 0. We tune a confidence parameter similar to β for Bootstrap-TS. |