Context-lumpable stochastic bandits

Authors: Chung-Wei Lee, Qinghua Liu, Yasin Abbasi Yadkori, Chi Jin, Tor Lattimore, Csaba Szepesvari

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We consider a contextual bandit problem with S contexts and K actions. ... we give an algorithm that outputs an ε-optimal policy after using at most e O(r(S + K)/ε2) samples with high probability and provide a matching Ω(r(S + K)/ε2) lower bound. In the regret minimization setting, we give an algorithm whose cumulative regret up to time T is bounded by e O( p r3(S + K)T). ... The pseudocode of our algorithm is provided in Algorithm 1, which consists of three main modules. ... Now we present the theoretical guarantee for Algorithm 1.
Researcher Affiliation Collaboration Chung-Wei Lee University of Southern California leechung@usc.edu Qinghua Liu Princeton University qinghual@princeton.edu Yasin Abbasi-Yadkori Google Deep Mind yadkori@google.com Chi Jin Princeton University chij@princeton.edu Tor Lattimore Google Deep Mind lattimore@google.com Csaba Szepesvári Google Deep Mind and University of Alberta szepesva@ualberta.ca
Pseudocode Yes The pseudocode of our algorithm is provided in Algorithm 1, which consists of three main modules. ... Algorithm 1 Algorithm for Almost-uniform Context Distribution ... Algorithm 2 Data Collection ... Algorithm 3 Algorithm for General Context Distribution ... Algorithm 4 Algorithm for Uniform Block Distribution ... Algorithm 5 Split Contexts Into Multiple Blocks ... Algorithm 6 Algorithm for Non-uniform Block Distribution
Open Source Code No The paper is a theoretical work focusing on algorithm design and analysis, and it does not contain any statements about making source code publicly available or providing links to code repositories.
Open Datasets No The paper defines abstract concepts like 'context distribution' and 'block distribution' for theoretical analysis. It does not mention or use any specific named or publicly available datasets for training purposes, nor does it refer to public access information for any dataset.
Dataset Splits No The paper is a theoretical work and does not describe empirical experiments involving dataset splits for training, validation, or testing. Thus, no specific split percentages or methodologies are mentioned.
Hardware Specification No The paper is theoretical and focuses on algorithm design and analysis; it does not describe any computational experiments that would require details about hardware specifications.
Software Dependencies No The paper is theoretical and does not describe any computational experiments that would require specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and focuses on algorithms and their theoretical guarantees. It does not include details on experimental setup such as hyperparameters, optimization settings, or training configurations, as it does not report on empirical experiments.