reproducibilityindex.ai

Doubly Robust Thompson Sampling with Linear Payoffs

Authors: Wonyoung Kim, Gi-Soo Kim, Myunghee Cho Paik

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical studies show the advantage of the proposed algorithm over Lin TS. In this section, we compare the performances of the three algorithms: (i) Lin TS [Agrawal and Goyal, 2013], (ii) BLTS [Dimakopoulou et al., 2019], and (iii) the proposed DRTS. We use simulated data described as follows. Figure 1 shows the average of the cumulative regrets and the estimation error bβt β 2 of the three algorithms based on 10 replications.
Researcher Affiliation	Collaboration	Wonyoung Kim Department of Statistics Seoul National University eraser347@snu.ac.kr Gi-Soo Kim Department of Industrial Engineering & Artiﬁcial Intelligence Graduate School UNIST gisookim@unist.ac.kr Myunghee Cho Paik Department of Statistics Seoul National University Shepherd23 Inc. myungheechopaik@snu.ac.kr
Pseudocode	Yes	Algorithm 1 Doubly Robust Thompson Sampling for Linear Contextual Bandits (DRTS)
Open Source Code	No	The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	We use simulated data described as follows. For each element of the contexts j = 1, , d, we generate [X1j(t), , XNj(t)] from a normal distribution N(µN, VN)... To generate the stochastic rewards, we sample ηi(t) independently from N(0, 1). The data is generated via simulation, not from a publicly available dataset.
Dataset Splits	No	The paper does not specify explicit training, validation, or test dataset splits. It describes generating simulated data for T=20000 rounds for online learning.
Hardware Specification	No	The paper mentions 'Simulation studies' but does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to conduct these simulations.
Software Dependencies	No	The paper states 'Other implementation details are in supplementary materials' but does not specify any software names with version numbers in the main text.
Experiment Setup	Yes	We consider v {0.001, 0.01, 0.1, 1} in all three algorithms, γ {0.01, 0.05, 0.1} for BLTS, and set γ = 1/(N + 1) in DRTS. Then we report the minimum regrets among all combinations. The regularization parameter is λt = t in DRTS and λt = 1 in both Lin TS and BLTS.