Action Centered Contextual Bandits

Authors: Kristjan Greenewald, Ambuj Tewari, Susan Murphy, Predag Klasnja

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theory is supported by experiments on data gathered in a recently concluded mobile health study. We show, both theoretically and empirically, that the performance of an appropriately designed action-centered contextual bandit algorithm is agnostic to the high model complexity of the baseline reward. Instead, we get the same level of performance as expected in a stationary, linear model setting. Finally, we use data gathered in the recently conducted Heart Steps study to validate our model and theory.
Researcher Affiliation Academia Kristjan Greenewald Department of Statistics Harvard University kgreenewald@fas.harvard.edu Ambuj Tewari Department of Statistics University of Michigan tewaria@umich.edu Predrag Klasnja School of Information University of Michigan klasnja@umich.edu Susan Murphy Departments of Statistics and Computer Science Harvard University samurphy@fas.harvard.edu
Pseudocode Yes Algorithm 1 Action-Centered Thompson Sampling
Open Source Code No No explicit statement about the release of source code or a link to a code repository for the methodology described in this paper was found.
Open Datasets No No concrete access information (link, DOI, repository, or formal citation with authors/year for the dataset) for a publicly available or open dataset was provided. The paper refers to data from a 'recently concluded mobile health study' and 'Heart Steps study data' but without public access details.
Dataset Splits No No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing was found in the paper.
Hardware Specification No No specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running experiments were found.
Software Dependencies No No specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment were found.
Experiment Setup Yes We set πmin = 0.2, πmax = 0.8. In each experiment, we choose a true reward generative model rt(s, a) inspired by data from the Heart Steps study (for details see Section 1.1 in the supplement), and generate two length T sequences of state vectors st,a RNK and st RL, where the st are iid Gaussian and st,a is formed by stacking columns I(a = i)[1; st] for i = 1, . . . , N. We consider both nonlinear and nonstationary baselines, while keeping the treatment effect models the same. The reward for that message was defined to be log(0.5 + x) where x is the step count of the participant in the 30 minutes following the suggestion. As above we set πmin = 0.2, πmax = 0.8.