Meta-Learning Adversarial Bandit Algorithms
Authors: Misha Khodak, Ilya Osadchiy, Keegan Harris, Maria-Florina F. Balcan, Kfir Y. Levy, Ron Meir, Steven Z. Wu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | As the first to target the adversarial online-within-online partial-information setting, we design meta-algorithms that combine outer learners to simultaneously tune the initialization and other hyperparameters of an inner learner for two important cases: multi-armed bandits (MAB) and bandit linear optimization (BLO). For MAB, the meta-learners initialize and set hyperparameters of the Tsallis-entropy generalization of Exp3, with the task-averaged regret improving if the entropy of the optima-in-hindsight is small. For BLO, we learn to initialize and tune online mirror descent (OMD) with self-concordant barrier regularizers, showing that task-averaged regret varies directly with an action spacedependent measure they induce. Our guarantees rely on proving that unregularized follow-the-leader combined with two levels of low-dimensional hyperparameter tuning is enough to learn a sequence of affine functions of non-Lipschitz and sometimes non-convex Bregman divergences bounding the regret of OMD. |
| Researcher Affiliation | Academia | Mikhail Khodak Carnegie Mellon University khodak@cmu.edu Ilya Osadchiy Technion Israel Institute of Technology osadchiy.ilya@gmail.com Keegan Harris CMU Maria-Florina Balcan CMU Kfir Y. Levy Technion Ron Meir Technion Zhiwei Steven Wu CMU |
| Pseudocode | Yes | Algorithm 1: Tunes OMDη,θ with regularizer ψθ : K 7 R and step-size η > 0, which when run over loss estimators ˆℓt,1, . . . , ˆℓt,m, yielding task-optima ˆxt = arg minx K Pm i=1 ˆℓt,i, x . |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating the release of open-source code for the described methodology. |
| Open Datasets | No | The paper describes theoretical algorithms and their guarantees for |
| Dataset Splits | No | The paper does not discuss dataset splits for training, validation, or testing, as it is a theoretical work. |
| Hardware Specification | No | The paper does not mention any specific hardware specifications used for experiments, as it is a theoretical work. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not include details about an experimental setup, such as hyperparameter values or training configurations. |