reproducibilityindex.ai

Incentivized Exploration for Multi-Armed Bandits under Reward Drift

Authors: Zhiyuan Liu, Huazheng Wang, Fan Shen, Kai Liu, Lijun Chen4981-4988

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical examples are provided to complement the theoretical analysis.Numerical Examples In this section, we carry out numerical experiments using synthetic data to complement the previous analysis of the incentivized MAB algorithms under reward drift, including UCB, ε-Greedy and Thompson Sampling.
Researcher Affiliation	Academia	Zhiyuan Liu Department of Computer Science University of Colorado, Boulder zhiyuan.liu@colorado.edu Huazheng Wang Department of Computer Science University of Virginia hw7ww@virginia.edu Fan Shen Technology, Cybersecurity and Policy University of Colorado, Boulder fan.shen@colorado.edu Kai Liu Computer Science Division Clemson University kail@clemson.edu Lijun Chen Department of Computer Science University of Colorado, Boulder lijun.chen@colorado.edu
Pseudocode	Yes	Algorithm 1: Incentivized MAB under Reward Drift; Algorithm 2: Incentivized UCB under Reward Drift; Algorithm 3: Incentivized ε-Greedy under Reward Drift; Algorithm 4: Incentivized Thompson Sampling under Reward Drift
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described.
Open Datasets	No	The numerical experiments use 'synthetic data' generated by the authors, and no information about a publicly available dataset with concrete access details (link, DOI, citation) is provided.
Dataset Splits	No	The paper uses synthetic data and performs numerical experiments over 'trials' but does not specify explicit training, validation, or test dataset splits in terms of percentages, sample counts, or predefined splits.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library names with version numbers, used for the experiments.
Experiment Setup	Yes	We generate a pool of K = 9 arms with mean reward vector μ = [0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1]. In each iteration, after the player pulls an arm It, reward rt is set to the arm s mean reward plus a random term drawn from N(0, 1), i.e. rt = μIt+N(0, 1). For the reward drift under compensation, we consider a linear drifting function bt = lxt where xt is the compensation offered by the principle and coefﬁcient l >= 0. We show the performance of the incentivzed MAB algorithms under drifted reward with drift coefﬁcient l = 1.1.