Graphical Models Meet Bandits: A Variational Thompson Sampling Approach

Authors: Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate our algorithms in three structured bandit problems, and show that they perform as well as or better than problem-specific state-of-the-art baselines. 7. Experiments In this section, we show that our algorithms can be applied beyond traditional models and learn more general models. Specifically, we compare our approaches to traditional bandit algorithms for the cascade model, position-based model, and rank-1 bandit. The performance of the algorithms is measured by their average cumulative reward.
Researcher Affiliation Collaboration 1Carnegie Mellon University 2Google Research 3Deep Mind 4Duke University 5Norwegian University of Science and Technology.
Pseudocode Yes Algorithm 1 id TSvi: Influence diagram TS with variational inference. 1: Input: > 0 2: Randomly initialize q 3: for t = 1, . . . , n do 4: Sample t proportionally to q( t) 5: Take action at = arg maxa2AK r(a, t) 6: Observes xt and receive reward r(xt, zt) 7: Randomly initialize q 8: Calculate L(q) using (3) and set L0(q) = 1 9: while L(q) L0(q) do 10: Set L0(q) = L(q) 11: for = 1, . . . , t do 12: Update q (z ) using (4), for all z 13: end for 14: Update q( ) using (5) 15: Update L(q) using (3) 16: end while 17: end for
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No The paper describes problem setups and parameters (e.g., 'The number of items is L = 20 and the length of the list is K = 2. The attraction probability of item i is i = i/20.') but does not refer to or provide access information for a publicly available or open dataset.
Dataset Splits No The paper does not provide specific dataset split information (percentages, counts, or predefined splits).
Hardware Specification Yes The run time is measured on a computer with one 2.9 GHz Intel Core i7 processor and 16 GB memory.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup Yes The number of items is L = 20 and the length of the list is K = 2. The attraction probability of item i is i = i/20. We consider two variants of the problem. In position-based model 1, the examination probabilities of both positions are 0.7. In position-based model 2, the examination probabilities of the two positions are 0.8 and 0.2. ... To speed up id TSvi and id TSinc in our experiments, we do not run the while loop in line 9 of Algorithm 1 until convergence. In id TSvi, we run it only once. In id TSinc, which is less stable in estimating qt(zt) but more computationally efficient, we run it up to 30 times.