Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improved Regret Bounds for Bandits with Expert Advice

Authors: NicolΓ² Cesa-Bianchi, Khaled Eldowa, Emmanuel Esposito, Julia Olkhovskaya

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order KTln(N/K) for the worst-case regret... For the standard feedback model, we prove a new instance-based upper bound... Road map. We formalize the problem setting in the next section. In Section 3, as a preliminary building block, we present Algorithm 1, an instance of the follow-the-regularized-leader (FTRL) algorithm... We then show in Section 4 that combining this algorithm with a doubling trick allows us to achieve the improved instance-based bound mentioned above. The lower bound for the restricted feedback setting is presented in Section 5.
Researcher Affiliation Academia NICOLΓ’ CESA-BIANCHI, UniversitΓ  degli Studi di Milano, Italy and Politecnico di Milano, Italy KHALED ELDOWA, UniversitΓ  degli Studi di Milano, Italy and Politecnico di Milano, Italy EMMANUEL ESPOSITO, UniversitΓ  degli Studi di Milano, Italy JULIA OLKHOVSKAYA, TU Delft, Netherlands
Pseudocode Yes Algorithm 1 π‘ž-FTRL for bandits with expert advice input: π‘ž (0, 1), πœ‚> 0 initialization: 𝑝1(𝑖) 1/𝑁for all 𝑖 𝑉 for 𝑑= 1, . . . ,𝑇do receive expert advice (πœƒπ‘– 𝑑)𝑖 𝑉 draw expert 𝐼𝑑 𝑝𝑑and action 𝐴𝑑 πœƒπΌπ‘‘ 𝑑 construct b𝑦𝑑 R𝑁where b𝑦𝑑(𝑖) := πœƒπ‘– 𝑑(𝐴𝑑) Í 𝑗 𝑉𝑝𝑑(𝑗)πœƒπ‘— 𝑑(𝐴𝑑) ℓ𝑑(𝐴𝑑) for all 𝑖 𝑉 let 𝑝𝑑+1 arg min𝑝 Ξ”π‘πœ‚ Í𝑑 𝑠=1 b𝑦𝑠, 𝑝 +πœ“π‘ž(𝑝) end for Algorithm 2 π‘ž-FTRL with the doubling trick for bandits with expert advice 1: input: 𝐽 (0, 𝑁] 2: initialization: π‘Ÿ1 log2 𝐽 1, π‘š1 1, 𝑝1(𝑖) 1/𝑁for all 𝑖 𝑉 3: define: For each integer π‘Ÿ ( , log2 𝑁], 1 + ln(𝑁/2π‘Ÿ) ln(𝑁/2π‘Ÿ)2 + 4 + 2 π‘žπ‘Ÿ(𝑁1 π‘žπ‘Ÿ 1) 𝑒𝑇(1 π‘žπ‘Ÿ) (2π‘Ÿ)π‘žπ‘Ÿ, π‘žπ‘Ÿ 1 π‘žπ‘Ÿ π‘žπ‘Ÿ 1 2 π‘žπ‘Ÿ ) 4: for 𝑑= 1, . . . ,𝑇do 5: receive expert advice (πœƒπ‘– 𝑑)𝑖 𝑉 6: draw expert 𝐼𝑑 𝑝𝑑and action 𝐴𝑑 πœƒπΌπ‘‘ 𝑑 7: construct b𝑦𝑑 R𝑁where b𝑦𝑑(𝑖) := πœƒπ‘– 𝑑(𝐴𝑑) Í 𝑗 𝑉𝑝𝑑(𝑗)πœƒπ‘— 𝑑(𝐴𝑑) ℓ𝑑(𝐴𝑑) for all 𝑖 𝑉 𝑇 Í𝑑 𝑠=π‘šπ‘‘π‘„π‘ (𝑝𝑠) > 2π‘Ÿπ‘‘+1 then 9: 𝑝𝑑+1(𝑖) 1/𝑁for all 𝑖 𝑉 10: π‘Ÿπ‘‘+1 log2 1 𝑇 Í𝑑 𝑠=π‘šπ‘‘π‘„π‘ (𝑝𝑠) 1, π‘šπ‘‘+1 𝑑+ 1 11: else 12: 𝑝𝑑+1 arg min𝑝 Ξ”π‘πœ‚π‘Ÿπ‘‘ Í𝑑 𝑠=π‘šπ‘‘b𝑦𝑠, 𝑝 +πœ“π‘žπ‘Ÿπ‘‘(𝑝) 13: π‘Ÿπ‘‘+1 π‘Ÿπ‘‘, π‘šπ‘‘+1 π‘šπ‘‘ 14: end if 15: end for
Open Source Code No The paper does not contain any statement about making source code available, nor does it provide links to any code repositories or supplementary material containing code.
Open Datasets No The paper focuses on theoretical regret bounds for multi-armed bandit problems with expert advice and does not involve empirical experiments using specific datasets.
Dataset Splits No The paper is theoretical, providing proofs and algorithms related to regret bounds. It does not conduct experiments on datasets, thus no dataset splits are mentioned.
Hardware Specification No The paper is theoretical and does not report on experimental results that would require specific hardware for computation. Therefore, no hardware specifications are mentioned.
Software Dependencies No The paper describes algorithms (q-FTRL) and proves theoretical bounds. It does not provide an implementation or mention any specific software packages or libraries with version numbers required to reproduce experiments.
Experiment Setup No The paper is theoretical, focusing on mathematical proofs and algorithm design, rather than empirical evaluation. It defines parameters within the algorithms (e.g., 'q', 'Ξ·', 'J') but these are not 'experimental setup details' in the context of running experiments with hyperparameter tuning.