Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Improved Regret Bounds for Bandits with Expert Advice

Authors: Nicolò Cesa-Bianchi, Khaled Eldowa, Emmanuel Esposito, Julia Olkhovskaya

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order KTln(N/K) for the worst-case regret... For the standard feedback model, we prove a new instance-based upper bound... Road map. We formalize the problem setting in the next section. In Section 3, as a preliminary building block, we present Algorithm 1, an instance of the follow-the-regularized-leader (FTRL) algorithm... We then show in Section 4 that combining this algorithm with a doubling trick allows us to achieve the improved instance-based bound mentioned above. The lower bound for the restricted feedback setting is presented in Section 5.
Researcher Affiliation	Academia	NICOLÒ CESA-BIANCHI, Università degli Studi di Milano, Italy and Politecnico di Milano, Italy KHALED ELDOWA, Università degli Studi di Milano, Italy and Politecnico di Milano, Italy EMMANUEL ESPOSITO, Università degli Studi di Milano, Italy JULIA OLKHOVSKAYA, TU Delft, Netherlands
Pseudocode	Yes	Algorithm 1 𝑞-FTRL for bandits with expert advice input: 𝑞 (0, 1), 𝜂> 0 initialization: 𝑝1(𝑖) 1/𝑁for all 𝑖 𝑉 for 𝑡= 1, . . . ,𝑇do receive expert advice (𝜃𝑖 𝑡)𝑖 𝑉 draw expert 𝐼𝑡 𝑝𝑡and action 𝐴𝑡 𝜃𝐼𝑡 𝑡 construct b𝑦𝑡 R𝑁where b𝑦𝑡(𝑖) := 𝜃𝑖 𝑡(𝐴𝑡) Í 𝑗 𝑉𝑝𝑡(𝑗)𝜃𝑗 𝑡(𝐴𝑡) ℓ𝑡(𝐴𝑡) for all 𝑖 𝑉 let 𝑝𝑡+1 arg min𝑝 Δ𝑁𝜂 Í𝑡 𝑠=1 b𝑦𝑠, 𝑝 +𝜓𝑞(𝑝) end for Algorithm 2 𝑞-FTRL with the doubling trick for bandits with expert advice 1: input: 𝐽 (0, 𝑁] 2: initialization: 𝑟1 log2 𝐽 1, 𝑚1 1, 𝑝1(𝑖) 1/𝑁for all 𝑖 𝑉 3: define: For each integer 𝑟 ( , log2 𝑁], 1 + ln(𝑁/2𝑟) ln(𝑁/2𝑟)2 + 4 + 2 𝑞𝑟(𝑁1 𝑞𝑟 1) 𝑒𝑇(1 𝑞𝑟) (2𝑟)𝑞𝑟, 𝑞𝑟 1 𝑞𝑟 𝑞𝑟 1 2 𝑞𝑟 ) 4: for 𝑡= 1, . . . ,𝑇do 5: receive expert advice (𝜃𝑖 𝑡)𝑖 𝑉 6: draw expert 𝐼𝑡 𝑝𝑡and action 𝐴𝑡 𝜃𝐼𝑡 𝑡 7: construct b𝑦𝑡 R𝑁where b𝑦𝑡(𝑖) := 𝜃𝑖 𝑡(𝐴𝑡) Í 𝑗 𝑉𝑝𝑡(𝑗)𝜃𝑗 𝑡(𝐴𝑡) ℓ𝑡(𝐴𝑡) for all 𝑖 𝑉 𝑇 Í𝑡 𝑠=𝑚𝑡𝑄𝑠(𝑝𝑠) > 2𝑟𝑡+1 then 9: 𝑝𝑡+1(𝑖) 1/𝑁for all 𝑖 𝑉 10: 𝑟𝑡+1 log2 1 𝑇 Í𝑡 𝑠=𝑚𝑡𝑄𝑠(𝑝𝑠) 1, 𝑚𝑡+1 𝑡+ 1 11: else 12: 𝑝𝑡+1 arg min𝑝 Δ𝑁𝜂𝑟𝑡 Í𝑡 𝑠=𝑚𝑡b𝑦𝑠, 𝑝 +𝜓𝑞𝑟𝑡(𝑝) 13: 𝑟𝑡+1 𝑟𝑡, 𝑚𝑡+1 𝑚𝑡 14: end if 15: end for
Open Source Code	No	The paper does not contain any statement about making source code available, nor does it provide links to any code repositories or supplementary material containing code.
Open Datasets	No	The paper focuses on theoretical regret bounds for multi-armed bandit problems with expert advice and does not involve empirical experiments using specific datasets.
Dataset Splits	No	The paper is theoretical, providing proofs and algorithms related to regret bounds. It does not conduct experiments on datasets, thus no dataset splits are mentioned.
Hardware Specification	No	The paper is theoretical and does not report on experimental results that would require specific hardware for computation. Therefore, no hardware specifications are mentioned.
Software Dependencies	No	The paper describes algorithms (q-FTRL) and proves theoretical bounds. It does not provide an implementation or mention any specific software packages or libraries with version numbers required to reproduce experiments.
Experiment Setup	No	The paper is theoretical, focusing on mathematical proofs and algorithm design, rather than empirical evaluation. It defines parameters within the algorithms (e.g., 'q', 'η', 'J') but these are not 'experimental setup details' in the context of running experiments with hyperparameter tuning.