Confidence-Budget Matching for Sequential Budgeted Learning

Authors: Yonathan Efroni, Nadav Merlis, Aadirupa Saha, Shie Mannor

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We start by analyzing the performance of greedy algorithms that query a reward whenever they can. We show that in fully stochastic settings, doing so performs surprisingly well, but in the presence of any adversity, this might lead to linear regret. To overcome this issue, we propose the Confidence-Budget Matching (CBM) principle that queries rewards when the confidence intervals are wider than the inverse square root of the available budget. We analyze the performance of CBM based algorithms in different settings and show that they perform well in the presence of adversity in the contexts, initial states, and budgets.
Researcher Affiliation Collaboration 1Microsoft Research, New York 2Technion, Israel 3Nvidia Research, Israel.
Pseudocode Yes Algorithm 1 Greedy Reduction; Algorithm 2 Confidence-Budget Matching (CBM) Scheme
Open Source Code No The paper does not provide any concrete access information (e.g., repository links, explicit statements of code release) for its source code. The paper is theoretical in nature, focusing on algorithms and regret bounds.
Open Datasets No The paper is theoretical and does not conduct empirical studies using datasets, thus no information on publicly available datasets for training is provided.
Dataset Splits No The paper is theoretical and does not conduct empirical studies involving dataset splits. Therefore, no information on training, validation, or test splits is provided.
Hardware Specification No The paper is theoretical and does not describe empirical experiments or their computational setup. Therefore, no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not conduct empirical experiments, thus no specific software dependencies with version numbers are mentioned.
Experiment Setup No The paper is theoretical and does not describe empirical experiments or their setup. Therefore, no details on hyperparameters, training configurations, or system-level settings are provided.