Doubly Robust Thompson Sampling with Linear Payoffs
Authors: Wonyoung Kim, Gi-Soo Kim, Myunghee Cho Paik
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical studies show the advantage of the proposed algorithm over Lin TS. In this section, we compare the performances of the three algorithms: (i) Lin TS [Agrawal and Goyal, 2013], (ii) BLTS [Dimakopoulou et al., 2019], and (iii) the proposed DRTS. We use simulated data described as follows. Figure 1 shows the average of the cumulative regrets and the estimation error bβt β 2 of the three algorithms based on 10 replications. |
| Researcher Affiliation | Collaboration | Wonyoung Kim Department of Statistics Seoul National University eraser347@snu.ac.kr Gi-Soo Kim Department of Industrial Engineering & Artificial Intelligence Graduate School UNIST gisookim@unist.ac.kr Myunghee Cho Paik Department of Statistics Seoul National University Shepherd23 Inc. myungheechopaik@snu.ac.kr |
| Pseudocode | Yes | Algorithm 1 Doubly Robust Thompson Sampling for Linear Contextual Bandits (DRTS) |
| Open Source Code | No | The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | We use simulated data described as follows. For each element of the contexts j = 1, , d, we generate [X1j(t), , XNj(t)] from a normal distribution N(µN, VN)... To generate the stochastic rewards, we sample ηi(t) independently from N(0, 1). The data is generated via simulation, not from a publicly available dataset. |
| Dataset Splits | No | The paper does not specify explicit training, validation, or test dataset splits. It describes generating simulated data for T=20000 rounds for online learning. |
| Hardware Specification | No | The paper mentions 'Simulation studies' but does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to conduct these simulations. |
| Software Dependencies | No | The paper states 'Other implementation details are in supplementary materials' but does not specify any software names with version numbers in the main text. |
| Experiment Setup | Yes | We consider v {0.001, 0.01, 0.1, 1} in all three algorithms, γ {0.01, 0.05, 0.1} for BLTS, and set γ = 1/(N + 1) in DRTS. Then we report the minimum regrets among all combinations. The regularization parameter is λt = t in DRTS and λt = 1 in both Lin TS and BLTS. |