Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Generalizing the Regret: an Analysis of Lower and Upper Bounds
Authors: Marco Mussi, Alberto Maria Metelli
JAIR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this appendix, we provide numerical examples to empirically validate our findings. We consider the performances of UCB1 with a bandit made of K = 10 arms over 10 runs and comparing the empirical regret (EXP, mean Β± std) with the instance-dependent lower (LB) and upper (UB) bounds, for different choices of function g and for different time horizons T β {1Β·10^5, 5Β·10^5, 1Β·10^6}. The results are presented in Table 2. We can observe how the empirical results are consistent with our theoretical findings for all the g and all the time horizons T considered. |
| Researcher Affiliation | Academia | Marco Mussi EMAIL Alberto Maria Metelli EMAIL Politecnico di Milano Piazza Leonardo da Vinci 32, Milan, 20133, Italy |
| Pseudocode | Yes | Algorithm 1 UCB1 (Auer et al., 2002; Bubeck, 2010). Require: number of arms K, exploration parameter a > 2, subgaussianity parameter Ο Ni β 0, Β΅iΛ β 0, U CBi β β, βi β [K] for t β [T] do Select It β arg max iβ[K] U CBi Play It and observe reward Xt Update Β΅ΛIt β Β΅ΛIt NIt Xt NIt + 1 , NIt β NIt + 1 Compute U CBi β Β΅Λi + Ο β a log t Ni , βi β [K] end for Algorithm 2 MOSS (Audibert and Bubeck, 2009, 2010). Require: number of arms K, learning horizon T, subgaussianity parameter Ο Ni β 0, Β΅iΛ β 0, U CBi β β, βi β [K] for t β [T] do Select It β arg max iβ[K] U CBi Play It and observe reward Xt Update Β΅ΛIt β Β΅ΛIt NIt Xt NIt + 1 , NIt β NIt + 1 Compute U CBIt β Β΅ΛIt + Ο β 4 NIt log T K NIt where log(x) β log(max{1, x}) end for |
| Open Source Code | No | The paper does not contain any explicit statements or links regarding the availability of open-source code for the methodology described. |
| Open Datasets | No | The paper uses a 'bandit made of K = 10 arms' for numerical examples but does not provide concrete access information (link, DOI, repository, formal citation) for a publicly available or open dataset. |
| Dataset Splits | No | The paper's numerical examples are based on a simulated 'bandit made of K = 10 arms' and do not involve explicit training/test/validation dataset splits of a pre-existing dataset. No specific dataset split information is provided. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its numerical examples or experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers, used to implement or run the described algorithms or numerical examples. |
| Experiment Setup | No | While the paper mentions general parameters for numerical examples such as 'K = 10 arms' and 'different time horizons T β {1Β·10^5, 5Β·10^5, 1Β·10^6}', it does not specify concrete hyperparameter values (e.g., the 'a' parameter for UCB1 mentioned in Algorithm 1) or other system-level training configurations used in the experiments. |