Minimax Regret for Cascading Bandits
Authors: Daniel Vial, Sujay Sanghavi, Sanjay Shakkottai, R. Srikant
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This work provides problem-independent (i.e., gap-free) regret bounds for cascading bandits that strictly improve the state-of-the-art. In the case of unstructured rewards, our results provide the first minimax-optimal regret bounds (up to log terms). Our key insight is that, compared to the standard bandit problem, the reward variance plays an outsized role in the gap-free analysis. (...) Before closing, we conduct experiments on both synthetic and real data. Some details regarding experimental setup are deferred to Appendix C. Code is available in the supplementary material. |
| Researcher Affiliation | Collaboration | Daniel Vial UT Austin & UIUC dvial@utexas.edu Sujay Sanghavi UT Austin & Amazon sanghavi@mail.utexas.edu Sanjay Shakkottai UT Austin sanjay.shakkottai@utexas.edu UIUC rsrikant@illinois.edu |
| Pseudocode | Yes | Algorithm 1: General UCB algorithm for tabular cascading bandits (...) Algorithm 2: Cascade WOFUL for linear cascading bandits |
| Open Source Code | Yes | Code is available in the supplementary material. (...) Complete code to recreate all plots is available in the supplementary material. |
| Open Datasets | Yes | Real data. We replicate the first experiment from Zong et al. [2016] on the Movie Lens-1M dataset (grouplens.org/datasets/movielens/1m/), which contains user ratings for L 4000 movies. |
| Dataset Splits | No | The paper states, "we divide the ratings into train and test sets," but it does not explicitly mention or detail a validation set or its split percentages/counts. |
| Hardware Specification | Yes | The experiments were run over several hours on a laptop, so no significant resources were used. |
| Software Dependencies | No | The paper does not explicitly state specific software dependencies with version numbers (e.g., Python version, PyTorch version, etc.). |
| Experiment Setup | Yes | Some details regarding experimental setup are deferred to Appendix C. Code is available in the supplementary material. (...) First, we use their default choices d = 20 and K = 4. Next, we divide the ratings into train and test sets based on the user who provided the rating. From the training data and a rank-d SVD approximation, we learn a feature mapping φ from movies to the probability that a uniformly random training user rated the movie more than three stars. |