Latent Bandits Revisited

Authors: Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Craig Boutilier

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental A comprehensive empirical study showcases the advantages of our approach. and Finally, in Section 5, we demonstrate their effectiveness in synthetic simulations and on a large-scale real-world dataset.
Researcher Affiliation Industry Joey Hong Google Research jxihong@google.com Branislav Kveton Google Research bkveton@google.com Manzil Zaheer Google Research manzilzaheer@google.com Yinlam Chow Google Research yinlamchow@google.com Amr Ahmed Google Research amra@google.com Craig Boutilier Google Research cboutilier@google.com
Pseudocode Yes Algorithm 1 m UCB, Algorithm 2 m TS, Algorithm 3 mm TS
Open Source Code No The paper does not include an unambiguous statement that the authors are releasing the code for the work described in this paper, nor does it provide a direct link to a source-code repository.
Open Datasets Yes We also assess the performance of our algorithms on the Movie Lens 1M dataset [17] and citation [17] F. Maxwell Harper and Joseph A. Konstan. The Movie Lens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (Tii S), 2015.
Dataset Splits No The paper states 'We randomly select 50% of all ratings as our training set and use the remaining 50% as the test set;' but does not explicitly mention a validation split.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments, only general statements like 'synthetic simulations' and 'large-scale real-world dataset'.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup Yes We evaluate each algorithm on 500 independent runs, with a uniformly sampled latent state in each run, and report the average reward over time., The rewards are drawn i.i.d. from P( | a, s) = N(µ(a, s), σ2) with σ = 0.5., We randomly select 50% of all ratings as our training set and use the remaining 50% as the test set; resulting in sparse rating matrices Mtrain and Mtest. We complete each matrix using least-squares matrix completion [29] with rank 20. This rank is high enough to yield a low prediction error, and yet small enough to avoid overfitting., Using k-means clustering on the rows of U, we cluster users into 5 clusters, where 5 is the largest value that does not yield empty clusters.