reproducibilityindex.ai

Minimax Regret for Cascading Bandits

Authors: Daniel Vial, Sujay Sanghavi, Sanjay Shakkottai, R. Srikant

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This work provides problem-independent (i.e., gap-free) regret bounds for cascading bandits that strictly improve the state-of-the-art. In the case of unstructured rewards, our results provide the ﬁrst minimax-optimal regret bounds (up to log terms). Our key insight is that, compared to the standard bandit problem, the reward variance plays an outsized role in the gap-free analysis. (...) Before closing, we conduct experiments on both synthetic and real data. Some details regarding experimental setup are deferred to Appendix C. Code is available in the supplementary material.
Researcher Affiliation	Collaboration	Daniel Vial UT Austin & UIUC dvial@utexas.edu Sujay Sanghavi UT Austin & Amazon sanghavi@mail.utexas.edu Sanjay Shakkottai UT Austin sanjay.shakkottai@utexas.edu UIUC rsrikant@illinois.edu
Pseudocode	Yes	Algorithm 1: General UCB algorithm for tabular cascading bandits (...) Algorithm 2: Cascade WOFUL for linear cascading bandits
Open Source Code	Yes	Code is available in the supplementary material. (...) Complete code to recreate all plots is available in the supplementary material.
Open Datasets	Yes	Real data. We replicate the ﬁrst experiment from Zong et al. [2016] on the Movie Lens-1M dataset (grouplens.org/datasets/movielens/1m/), which contains user ratings for L 4000 movies.
Dataset Splits	No	The paper states, "we divide the ratings into train and test sets," but it does not explicitly mention or detail a validation set or its split percentages/counts.
Hardware Specification	Yes	The experiments were run over several hours on a laptop, so no signiﬁcant resources were used.
Software Dependencies	No	The paper does not explicitly state specific software dependencies with version numbers (e.g., Python version, PyTorch version, etc.).
Experiment Setup	Yes	Some details regarding experimental setup are deferred to Appendix C. Code is available in the supplementary material. (...) First, we use their default choices d = 20 and K = 4. Next, we divide the ratings into train and test sets based on the user who provided the rating. From the training data and a rank-d SVD approximation, we learn a feature mapping φ from movies to the probability that a uniformly random training user rated the movie more than three stars.