reproducibilityindex.ai

Optimal cross-learning for contextual bandits with unknown context distributions

Authors: Jon Schneider, Julian Zimmert

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We consider the problem of designing contextual bandit algorithms in the crosslearning setting of Balseiro et al., where the learner observes the loss for the action they play in all possible contexts, not just the context of the current round. We speciﬁcally consider the setting where losses are chosen adversarially and contexts are sampled i.i.d. from an unknown distribution. In this setting, we resolve an open problem of Balseiro et al. by providing an efﬁcient algorithm with a nearly tight (up to logarithmic factors) regret bound of e O(TK), independent of the number of contexts. As a consequence, we obtain the ﬁrst nearly tight regret bounds for the problems of learning to bid in ﬁrst-price auctions (under unknown value distributions) and sleeping bandits with a stochastic action set. At the core of our algorithm is a novel technique for coordinating the execution of a learning algorithm over multiple epochs in such a way to remove correlations between estimation of the unknown distribution and the actions played by the algorithm. This technique may be of independent interest for other learning problems involving estimation of an unknown context distribution.
Researcher Affiliation	Industry	Jon Schneider Google Research jschnei@google.com Julian Zimmert Google Research zimmert@google.com
Pseudocode	Yes	Algorithm 1 Contextual cross-learning algorithm for the unknown distribution setting.
Open Source Code	No	The paper does not provide any links to open-source code or state that code will be made available.
Open Datasets	No	The paper discusses problem settings like 'contextual bandits' and 'first-price auctions' but does not refer to specific datasets or state that any data used is publicly available.
Dataset Splits	No	The paper is theoretical and does not describe empirical experiments with dataset splits. It does not mention training, validation, or test splits for any data.
Hardware Specification	No	The paper discusses 'Computational efﬁciency' and 'total per-step runtime and memory complexity of O(K)' but does not specify any particular hardware (e.g., GPU/CPU models, memory amounts) used for any computations or experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks, solvers).
Experiment Setup	No	The paper discusses parameters for its theoretical algorithm (η, γ, L, ι) which are used in the mathematical analysis. However, these are not parameters for an empirical experimental setup (e.g., learning rates, batch sizes, epochs for training a model on data) and no such experimental setup is described.