Best of Both Worlds Model Selection

Authors: Aldo Pacchiano, Christoph Dann, Claudio Gentile

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the problem of model selection in bandit scenarios in the presence of nested policy classes, with the goal of obtaining simultaneous adversarial and stochastic ( best of both worlds") high-probability regret guarantees. Our approach requires that each base learner comes with a candidate regret bound that may or may not hold, while our meta algorithm plays each base learner according to a schedule that keeps the base learner s candidate regret bounds balanced until they are detected to violate their guarantees. We develop careful mis-specification tests specifically designed to blend the above model selection criterion with the ability to leverage the (potentially benign) nature of the environment. We recover the model selection guarantees of the CORRAL [3] algorithm for adversarial environments, but with the additional benefit of achieving high probability regret bounds. More importantly, our model selection results also hold simultaneously in stochastic environments under gap assumptions. These are the first theoretical results that achieve best of both world (stochastic and adversarial) guarantees while performing model selection in contextual bandit scenarios.
Researcher Affiliation Industry Aldo Pacchiano Microsoft Research, NYC apacchiano@microsoft.com Christoph Dann Google, NYC cdann@cdann.net Claudio Gentile Google, NYC cgentile@google.com
Pseudocode Yes Algorithm 1: Arbe(δ, s = 1, t0 = 0) Adversarial Regret Balancing and Elimination
Open Source Code No 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A]
Open Datasets No The paper is theoretical and does not involve experimental training on a dataset. The checklist explicitly states 'N/A' for experiment-related questions.
Dataset Splits No The paper is theoretical and does not involve experimental validation on a dataset. The checklist explicitly states 'N/A' for experiment-related questions.
Hardware Specification No The paper is theoretical and does not include details on hardware specifications, as indicated by 'N/A' in the checklist for experiment-related questions.
Software Dependencies No The paper is theoretical and does not include details on specific software dependencies with version numbers, as indicated by 'N/A' in the checklist for experiment-related questions.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with hyperparameters or training settings, as indicated by 'N/A' in the checklist for experiment-related questions.