PRODuctive bandits: Importance Weighting No More

Authors: Julian Zimmert, Teodor Vanislavov Marinov

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical By leveraging the interpretation of Prod as a first-order OMD approximation, we present the following surprising results: 1. Variants of Prod can obtain optimal regret for adversarial multi-armed bandits. 2. There exists a simple and (arguably) importance-weighting free variant with optimal rate. 3. One can even achieve best-both-worlds guarantees with logarithmic regret in the stochastic regime.
Researcher Affiliation Industry Julian Zimmert Google Research zimmert@google.com Teodor V. Marinov Google Research tvmarinov@google.com
Pseudocode No The paper describes algorithm update rules within the text (e.g., 'WSU-UX uses importance-weighted updates... πt+1,i = πt,i(1 η(ˆℓt,i λt))'), but it does not contain clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [NA] Justification: No experiments requiring code.
Open Datasets No The paper is purely theoretical and does not conduct experiments with datasets.
Dataset Splits No The paper is purely theoretical and does not conduct experiments, so it does not specify training, validation, or test dataset splits.
Hardware Specification No Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [NA] Justification: No experiments.
Software Dependencies No The paper is purely theoretical and does not involve experiments requiring specific software dependencies with version numbers.
Experiment Setup No The paper is purely theoretical and does not involve experiments, thus no experimental setup details such as hyperparameters or system-level training settings are provided.