Graphical Models Meet Bandits: A Variational Thompson Sampling Approach
Authors: Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate our algorithms in three structured bandit problems, and show that they perform as well as or better than problem-specific state-of-the-art baselines. 7. Experiments In this section, we show that our algorithms can be applied beyond traditional models and learn more general models. Specifically, we compare our approaches to traditional bandit algorithms for the cascade model, position-based model, and rank-1 bandit. The performance of the algorithms is measured by their average cumulative reward. |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University 2Google Research 3Deep Mind 4Duke University 5Norwegian University of Science and Technology. |
| Pseudocode | Yes | Algorithm 1 id TSvi: Influence diagram TS with variational inference. 1: Input: > 0 2: Randomly initialize q 3: for t = 1, . . . , n do 4: Sample t proportionally to q( t) 5: Take action at = arg maxa2AK r(a, t) 6: Observes xt and receive reward r(xt, zt) 7: Randomly initialize q 8: Calculate L(q) using (3) and set L0(q) = 1 9: while L(q) L0(q) do 10: Set L0(q) = L(q) 11: for = 1, . . . , t do 12: Update q (z ) using (4), for all z 13: end for 14: Update q( ) using (5) 15: Update L(q) using (3) 16: end while 17: end for |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper describes problem setups and parameters (e.g., 'The number of items is L = 20 and the length of the list is K = 2. The attraction probability of item i is i = i/20.') but does not refer to or provide access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper does not provide specific dataset split information (percentages, counts, or predefined splits). |
| Hardware Specification | Yes | The run time is measured on a computer with one 2.9 GHz Intel Core i7 processor and 16 GB memory. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | The number of items is L = 20 and the length of the list is K = 2. The attraction probability of item i is i = i/20. We consider two variants of the problem. In position-based model 1, the examination probabilities of both positions are 0.7. In position-based model 2, the examination probabilities of the two positions are 0.8 and 0.2. ... To speed up id TSvi and id TSinc in our experiments, we do not run the while loop in line 9 of Algorithm 1 until convergence. In id TSvi, we run it only once. In id TSinc, which is less stable in estimating qt(zt) but more computationally efficient, we run it up to 30 times. |