reproducibilityindex.ai

Graphical Models Meet Bandits: A Variational Thompson Sampling Approach

Authors: Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate our algorithms in three structured bandit problems, and show that they perform as well as or better than problem-speciﬁc state-of-the-art baselines. 7. Experiments In this section, we show that our algorithms can be applied beyond traditional models and learn more general models. Speciﬁcally, we compare our approaches to traditional bandit algorithms for the cascade model, position-based model, and rank-1 bandit. The performance of the algorithms is measured by their average cumulative reward.
Researcher Affiliation	Collaboration	1Carnegie Mellon University 2Google Research 3Deep Mind 4Duke University 5Norwegian University of Science and Technology.
Pseudocode	Yes	Algorithm 1 id TSvi: Inﬂuence diagram TS with variational inference. 1: Input: > 0 2: Randomly initialize q 3: for t = 1, . . . , n do 4: Sample t proportionally to q( t) 5: Take action at = arg maxa2AK r(a, t) 6: Observes xt and receive reward r(xt, zt) 7: Randomly initialize q 8: Calculate L(q) using (3) and set L0(q) = 1 9: while L(q) L0(q) do 10: Set L0(q) = L(q) 11: for = 1, . . . , t do 12: Update q (z ) using (4), for all z 13: end for 14: Update q( ) using (5) 15: Update L(q) using (3) 16: end while 17: end for
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	No	The paper describes problem setups and parameters (e.g., 'The number of items is L = 20 and the length of the list is K = 2. The attraction probability of item i is i = i/20.') but does not refer to or provide access information for a publicly available or open dataset.
Dataset Splits	No	The paper does not provide specific dataset split information (percentages, counts, or predefined splits).
Hardware Specification	Yes	The run time is measured on a computer with one 2.9 GHz Intel Core i7 processor and 16 GB memory.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers.
Experiment Setup	Yes	The number of items is L = 20 and the length of the list is K = 2. The attraction probability of item i is i = i/20. We consider two variants of the problem. In position-based model 1, the examination probabilities of both positions are 0.7. In position-based model 2, the examination probabilities of the two positions are 0.8 and 0.2. ... To speed up id TSvi and id TSinc in our experiments, we do not run the while loop in line 9 of Algorithm 1 until convergence. In id TSvi, we run it only once. In id TSinc, which is less stable in estimating qt(zt) but more computationally efﬁcient, we run it up to 30 times.