Action Centered Contextual Bandits
Authors: Kristjan Greenewald, Ambuj Tewari, Susan Murphy, Predag Klasnja
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theory is supported by experiments on data gathered in a recently concluded mobile health study. We show, both theoretically and empirically, that the performance of an appropriately designed action-centered contextual bandit algorithm is agnostic to the high model complexity of the baseline reward. Instead, we get the same level of performance as expected in a stationary, linear model setting. Finally, we use data gathered in the recently conducted Heart Steps study to validate our model and theory. |
| Researcher Affiliation | Academia | Kristjan Greenewald Department of Statistics Harvard University kgreenewald@fas.harvard.edu Ambuj Tewari Department of Statistics University of Michigan tewaria@umich.edu Predrag Klasnja School of Information University of Michigan klasnja@umich.edu Susan Murphy Departments of Statistics and Computer Science Harvard University samurphy@fas.harvard.edu |
| Pseudocode | Yes | Algorithm 1 Action-Centered Thompson Sampling |
| Open Source Code | No | No explicit statement about the release of source code or a link to a code repository for the methodology described in this paper was found. |
| Open Datasets | No | No concrete access information (link, DOI, repository, or formal citation with authors/year for the dataset) for a publicly available or open dataset was provided. The paper refers to data from a 'recently concluded mobile health study' and 'Heart Steps study data' but without public access details. |
| Dataset Splits | No | No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing was found in the paper. |
| Hardware Specification | No | No specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running experiments were found. |
| Software Dependencies | No | No specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment were found. |
| Experiment Setup | Yes | We set πmin = 0.2, πmax = 0.8. In each experiment, we choose a true reward generative model rt(s, a) inspired by data from the Heart Steps study (for details see Section 1.1 in the supplement), and generate two length T sequences of state vectors st,a RNK and st RL, where the st are iid Gaussian and st,a is formed by stacking columns I(a = i)[1; st] for i = 1, . . . , N. We consider both nonlinear and nonstationary baselines, while keeping the treatment effect models the same. The reward for that message was defined to be log(0.5 + x) where x is the step count of the participant in the 30 minutes following the suggestion. As above we set πmin = 0.2, πmax = 0.8. |