Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Improved Regret Bounds for Bandits with Expert Advice
Authors: NicolΓ² Cesa-Bianchi, Khaled Eldowa, Emmanuel Esposito, Julia Olkhovskaya
JAIR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order KTln(N/K) for the worst-case regret... For the standard feedback model, we prove a new instance-based upper bound... Road map. We formalize the problem setting in the next section. In Section 3, as a preliminary building block, we present Algorithm 1, an instance of the follow-the-regularized-leader (FTRL) algorithm... We then show in Section 4 that combining this algorithm with a doubling trick allows us to achieve the improved instance-based bound mentioned above. The lower bound for the restricted feedback setting is presented in Section 5. |
| Researcher Affiliation | Academia | NICOLΓ CESA-BIANCHI, UniversitΓ degli Studi di Milano, Italy and Politecnico di Milano, Italy KHALED ELDOWA, UniversitΓ degli Studi di Milano, Italy and Politecnico di Milano, Italy EMMANUEL ESPOSITO, UniversitΓ degli Studi di Milano, Italy JULIA OLKHOVSKAYA, TU Delft, Netherlands |
| Pseudocode | Yes | Algorithm 1 π-FTRL for bandits with expert advice input: π (0, 1), π> 0 initialization: π1(π) 1/πfor all π π for π‘= 1, . . . ,πdo receive expert advice (ππ π‘)π π draw expert πΌπ‘ ππ‘and action π΄π‘ ππΌπ‘ π‘ construct bπ¦π‘ Rπwhere bπ¦π‘(π) := ππ π‘(π΄π‘) Γ π πππ‘(π)ππ π‘(π΄π‘) βπ‘(π΄π‘) for all π π let ππ‘+1 arg minπ Ξππ Γπ‘ π =1 bπ¦π , π +ππ(π) end for Algorithm 2 π-FTRL with the doubling trick for bandits with expert advice 1: input: π½ (0, π] 2: initialization: π1 log2 π½ 1, π1 1, π1(π) 1/πfor all π π 3: define: For each integer π ( , log2 π], 1 + ln(π/2π) ln(π/2π)2 + 4 + 2 ππ(π1 ππ 1) ππ(1 ππ) (2π)ππ, ππ 1 ππ ππ 1 2 ππ ) 4: for π‘= 1, . . . ,πdo 5: receive expert advice (ππ π‘)π π 6: draw expert πΌπ‘ ππ‘and action π΄π‘ ππΌπ‘ π‘ 7: construct bπ¦π‘ Rπwhere bπ¦π‘(π) := ππ π‘(π΄π‘) Γ π πππ‘(π)ππ π‘(π΄π‘) βπ‘(π΄π‘) for all π π π Γπ‘ π =ππ‘ππ (ππ ) > 2ππ‘+1 then 9: ππ‘+1(π) 1/πfor all π π 10: ππ‘+1 log2 1 π Γπ‘ π =ππ‘ππ (ππ ) 1, ππ‘+1 π‘+ 1 11: else 12: ππ‘+1 arg minπ Ξππππ‘ Γπ‘ π =ππ‘bπ¦π , π +ππππ‘(π) 13: ππ‘+1 ππ‘, ππ‘+1 ππ‘ 14: end if 15: end for |
| Open Source Code | No | The paper does not contain any statement about making source code available, nor does it provide links to any code repositories or supplementary material containing code. |
| Open Datasets | No | The paper focuses on theoretical regret bounds for multi-armed bandit problems with expert advice and does not involve empirical experiments using specific datasets. |
| Dataset Splits | No | The paper is theoretical, providing proofs and algorithms related to regret bounds. It does not conduct experiments on datasets, thus no dataset splits are mentioned. |
| Hardware Specification | No | The paper is theoretical and does not report on experimental results that would require specific hardware for computation. Therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper describes algorithms (q-FTRL) and proves theoretical bounds. It does not provide an implementation or mention any specific software packages or libraries with version numbers required to reproduce experiments. |
| Experiment Setup | No | The paper is theoretical, focusing on mathematical proofs and algorithm design, rather than empirical evaluation. It defines parameters within the algorithms (e.g., 'q', 'Ξ·', 'J') but these are not 'experimental setup details' in the context of running experiments with hyperparameter tuning. |