Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generalizing the Regret: an Analysis of Lower and Upper Bounds

Authors: Marco Mussi, Alberto Maria Metelli

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this appendix, we provide numerical examples to empirically validate our findings. We consider the performances of UCB1 with a bandit made of K = 10 arms over 10 runs and comparing the empirical regret (EXP, mean ± std) with the instance-dependent lower (LB) and upper (UB) bounds, for different choices of function g and for different time horizons T ∈ {1·10^5, 5·10^5, 1·10^6}. The results are presented in Table 2. We can observe how the empirical results are consistent with our theoretical findings for all the g and all the time horizons T considered.
Researcher Affiliation Academia Marco Mussi EMAIL Alberto Maria Metelli EMAIL Politecnico di Milano Piazza Leonardo da Vinci 32, Milan, 20133, Italy
Pseudocode Yes Algorithm 1 UCB1 (Auer et al., 2002; Bubeck, 2010). Require: number of arms K, exploration parameter a > 2, subgaussianity parameter Οƒ Ni ← 0, Β΅iΛ† ← 0, U CBi ← ∞, βˆ€i ∈ [K] for t ∈ [T] do Select It ∈ arg max i∈[K] U CBi Play It and observe reward Xt Update Β΅Λ†It ← Β΅Λ†It NIt Xt NIt + 1 , NIt ← NIt + 1 Compute U CBi ← Β΅Λ†i + Οƒ √ a log t Ni , βˆ€i ∈ [K] end for Algorithm 2 MOSS (Audibert and Bubeck, 2009, 2010). Require: number of arms K, learning horizon T, subgaussianity parameter Οƒ Ni ← 0, Β΅iΛ† ← 0, U CBi ← ∞, βˆ€i ∈ [K] for t ∈ [T] do Select It ∈ arg max i∈[K] U CBi Play It and observe reward Xt Update Β΅Λ†It ← Β΅Λ†It NIt Xt NIt + 1 , NIt ← NIt + 1 Compute U CBIt ← Β΅Λ†It + Οƒ √ 4 NIt log T K NIt where log(x) βˆ† log(max{1, x}) end for
Open Source Code No The paper does not contain any explicit statements or links regarding the availability of open-source code for the methodology described.
Open Datasets No The paper uses a 'bandit made of K = 10 arms' for numerical examples but does not provide concrete access information (link, DOI, repository, formal citation) for a publicly available or open dataset.
Dataset Splits No The paper's numerical examples are based on a simulated 'bandit made of K = 10 arms' and do not involve explicit training/test/validation dataset splits of a pre-existing dataset. No specific dataset split information is provided.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its numerical examples or experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library names with version numbers, used to implement or run the described algorithms or numerical examples.
Experiment Setup No While the paper mentions general parameters for numerical examples such as 'K = 10 arms' and 'different time horizons T ∈ {1·10^5, 5·10^5, 1·10^6}', it does not specify concrete hyperparameter values (e.g., the 'a' parameter for UCB1 mentioned in Algorithm 1) or other system-level training configurations used in the experiments.