Efficient Contextual Bandits with Continuous Actions
Authors: Maryam Majzoubi, Chicheng Zhang, Rajan Chari, Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove that it works in a general sense and verify the new functionality with large-scale experiments. We implement our algorithms in Vowpal Wabbit (vowpalwabbit.org), and compare with baselines on real datasets. Experiments demonstrate the efficacy and efficiency of our approach (Section 5). |
| Researcher Affiliation | Collaboration | Maryam Majzoubi New York University Chicheng Zhang University of Arizona Rajan Chari Microsoft Research Akshay Krishnamurthy Microsoft Research John Langford Microsoft Research Aleksandrs Slivkins Microsoft Research |
| Pseudocode | Yes | Algorithm 1 CATS: continuous action tree with smoothing; Algorithm 2 Tree training: Train tree; Algorithm 3 CATS Off |
| Open Source Code | No | The paper implements algorithms in Vowpal Wabbit (vowpalwabbit.org), which is an existing open-source platform, but it does not provide concrete access to the authors' own implementation code for the specific methodologies described in this paper. |
| Open Datasets | Yes | We evaluate our approach on six large-scale regression datasets... five are selected from Open ML with the criterion of having millions of samples with unique regression values (See Appendix F for more details). The Open ML datasets include: Microsoft (ID: 235127), Yandex (ID: 235128), Epsilon (ID: 235129), Helena (ID: 235130), Higgs (ID: 235131). |
| Dataset Splits | No | The paper states: 'We create an 80-20% split or training and test sets.' and mentions 'progressive validation [15] for online evaluation', but it does not specify explicit train/validation/test dataset splits or a distinct validation set percentage/count for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details (such as exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper states 'We implement our algorithms in Vowpal Wabbit (vowpalwabbit.org)', but it does not provide specific version numbers for Vowpal Wabbit or any other software dependencies needed to replicate the experiment. |
| Experiment Setup | Yes | All algorithms use ϵ = 0.05; see Appendix F for additional experimental details. With the training set, we first collect interaction log tuples of (xt, at, Pt(at | xt), ℓt(at)) using CATS with initial discretization and smoothing parameter (Kinit, hinit) = (4, 1/4), and greedy parameter ϵ = 0.05. We then run CATS Off over the logged data using J , defined in (1), as the set of parameters. |