Confidence-Budget Matching for Sequential Budgeted Learning
Authors: Yonathan Efroni, Nadav Merlis, Aadirupa Saha, Shie Mannor
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We start by analyzing the performance of greedy algorithms that query a reward whenever they can. We show that in fully stochastic settings, doing so performs surprisingly well, but in the presence of any adversity, this might lead to linear regret. To overcome this issue, we propose the Confidence-Budget Matching (CBM) principle that queries rewards when the confidence intervals are wider than the inverse square root of the available budget. We analyze the performance of CBM based algorithms in different settings and show that they perform well in the presence of adversity in the contexts, initial states, and budgets. |
| Researcher Affiliation | Collaboration | 1Microsoft Research, New York 2Technion, Israel 3Nvidia Research, Israel. |
| Pseudocode | Yes | Algorithm 1 Greedy Reduction; Algorithm 2 Confidence-Budget Matching (CBM) Scheme |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., repository links, explicit statements of code release) for its source code. The paper is theoretical in nature, focusing on algorithms and regret bounds. |
| Open Datasets | No | The paper is theoretical and does not conduct empirical studies using datasets, thus no information on publicly available datasets for training is provided. |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical studies involving dataset splits. Therefore, no information on training, validation, or test splits is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe empirical experiments or their computational setup. Therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not conduct empirical experiments, thus no specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper is theoretical and does not describe empirical experiments or their setup. Therefore, no details on hyperparameters, training configurations, or system-level settings are provided. |