Last Switch Dependent Bandits with Monotone Payoff Functions
Authors: Ayoub Foussoul, Vineet Goyal, Orestis Papadigenopoulos, Assaf Zeevi
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we take a step towards understanding the approximability of planning LSD bandits, namely, the (NP-hard) problem of computing an optimal arm-pulling strategy under complete knowledge of the model. In particular, we design the first efficient constant approximation algorithm for the problem and show that, under a natural monotonicity assumption on the payoffs, its approximation guarantee (almost) matches the state-of-the-art for the special and well-studied class of recharging bandits (also known as delaydependent). In this attempt, we develop new tools and insights for this class of problems, including a novel higher-dimensional relaxation and the technique of mirroring the evolution of virtual states. We believe that these novel elements could potentially be used for approaching richer classes of action-induced nonstationary bandits (e.g., special instances of restless bandits). In the case where the model parameters are initially unknown, we develop an online learning adaptation of our algorithm for which we provide sublinear regret guarantees against its full-information counterpart. |
| Researcher Affiliation | Academia | 1Industrial Engineering and Operations Research, Columbia University, New York, USA 2Data Science Institute, Columbia University, New York, USA 3Graduate School of Business, Columbia University, New York, USA. Correspondence to: Ayoub Foussoul <af3209@columbia.edu>, Orestis Papadigenopoulos <opapadig@columbia.edu>. |
| Pseudocode | Yes | Algorithm 1: Planning k-MLSD bandits. |
| Open Source Code | No | The paper does not provide any statements about the availability of open-source code or links to a code repository. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm design and analysis. It does not mention or use any specific public or open datasets for training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical validation with dataset splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for computations or experiments. |
| Software Dependencies | No | The paper does not specify any software names with version numbers for reproducibility. |
| Experiment Setup | No | The paper describes theoretical algorithm design and analysis, but it does not include details about an empirical experimental setup such as hyperparameters or training configurations. |