Doubly Robust Thompson Sampling with Linear Payoffs

Authors: Wonyoung Kim, Gi-Soo Kim, Myunghee Cho Paik

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical studies show the advantage of the proposed algorithm over Lin TS. In this section, we compare the performances of the three algorithms: (i) Lin TS [Agrawal and Goyal, 2013], (ii) BLTS [Dimakopoulou et al., 2019], and (iii) the proposed DRTS. We use simulated data described as follows. Figure 1 shows the average of the cumulative regrets and the estimation error bβt β 2 of the three algorithms based on 10 replications.
Researcher Affiliation Collaboration Wonyoung Kim Department of Statistics Seoul National University eraser347@snu.ac.kr Gi-Soo Kim Department of Industrial Engineering & Artificial Intelligence Graduate School UNIST gisookim@unist.ac.kr Myunghee Cho Paik Department of Statistics Seoul National University Shepherd23 Inc. myungheechopaik@snu.ac.kr
Pseudocode Yes Algorithm 1 Doubly Robust Thompson Sampling for Linear Contextual Bandits (DRTS)
Open Source Code No The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No We use simulated data described as follows. For each element of the contexts j = 1, , d, we generate [X1j(t), , XNj(t)] from a normal distribution N(µN, VN)... To generate the stochastic rewards, we sample ηi(t) independently from N(0, 1). The data is generated via simulation, not from a publicly available dataset.
Dataset Splits No The paper does not specify explicit training, validation, or test dataset splits. It describes generating simulated data for T=20000 rounds for online learning.
Hardware Specification No The paper mentions 'Simulation studies' but does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to conduct these simulations.
Software Dependencies No The paper states 'Other implementation details are in supplementary materials' but does not specify any software names with version numbers in the main text.
Experiment Setup Yes We consider v {0.001, 0.01, 0.1, 1} in all three algorithms, γ {0.01, 0.05, 0.1} for BLTS, and set γ = 1/(N + 1) in DRTS. Then we report the minimum regrets among all combinations. The regularization parameter is λt = t in DRTS and λt = 1 in both Lin TS and BLTS.