Incentivized Exploration for Multi-Armed Bandits under Reward Drift
Authors: Zhiyuan Liu, Huazheng Wang, Fan Shen, Kai Liu, Lijun Chen4981-4988
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical examples are provided to complement the theoretical analysis.Numerical Examples In this section, we carry out numerical experiments using synthetic data to complement the previous analysis of the incentivized MAB algorithms under reward drift, including UCB, ε-Greedy and Thompson Sampling. |
| Researcher Affiliation | Academia | Zhiyuan Liu Department of Computer Science University of Colorado, Boulder zhiyuan.liu@colorado.edu Huazheng Wang Department of Computer Science University of Virginia hw7ww@virginia.edu Fan Shen Technology, Cybersecurity and Policy University of Colorado, Boulder fan.shen@colorado.edu Kai Liu Computer Science Division Clemson University kail@clemson.edu Lijun Chen Department of Computer Science University of Colorado, Boulder lijun.chen@colorado.edu |
| Pseudocode | Yes | Algorithm 1: Incentivized MAB under Reward Drift; Algorithm 2: Incentivized UCB under Reward Drift; Algorithm 3: Incentivized ε-Greedy under Reward Drift; Algorithm 4: Incentivized Thompson Sampling under Reward Drift |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | No | The numerical experiments use 'synthetic data' generated by the authors, and no information about a publicly available dataset with concrete access details (link, DOI, citation) is provided. |
| Dataset Splits | No | The paper uses synthetic data and performs numerical experiments over 'trials' but does not specify explicit training, validation, or test dataset splits in terms of percentages, sample counts, or predefined splits. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers, used for the experiments. |
| Experiment Setup | Yes | We generate a pool of K = 9 arms with mean reward vector μ = [0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1]. In each iteration, after the player pulls an arm It, reward rt is set to the arm s mean reward plus a random term drawn from N(0, 1), i.e. rt = μIt+N(0, 1). For the reward drift under compensation, we consider a linear drifting function bt = lxt where xt is the compensation offered by the principle and coefficient l >= 0. We show the performance of the incentivzed MAB algorithms under drifted reward with drift coefficient l = 1.1. |