Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Generalizing the Regret: an Analysis of Lower and Upper Bounds
Authors: Marco Mussi, Alberto Maria Metelli
JAIR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this appendix, we provide numerical examples to empirically validate our findings. We consider the performances of UCB1 with a bandit made of K = 10 arms over 10 runs and comparing the empirical regret (EXP, mean Β± std) with the instance-dependent lower (LB) and upper (UB) bounds, for different choices of function g and for different time horizons T β {1Β·10^5, 5Β·10^5, 1Β·10^6}. The results are presented in Table 2. We can observe how the empirical results are consistent with our theoretical findings for all the g and all the time horizons T considered. |
| Researcher Affiliation | Academia | Marco Mussi EMAIL Alberto Maria Metelli EMAIL Politecnico di Milano Piazza Leonardo da Vinci 32, Milan, 20133, Italy |
| Pseudocode | Yes | Algorithm 1 UCB1 (Auer et al., 2002; Bubeck, 2010). Require: number of arms K, exploration parameter a > 2, subgaussianity parameter Ο Ni β 0, Β΅iΛ β 0, U CBi β β, βi β [K] for t β [T] do Select It β arg max iβ[K] U CBi Play It and observe reward Xt Update Β΅ΛIt β Β΅ΛIt NIt Xt NIt + 1 , NIt β NIt + 1 Compute U CBi β Β΅Λi + Ο β a log t Ni , βi β [K] end for Algorithm 2 MOSS (Audibert and Bubeck, 2009, 2010). Require: number of arms K, learning horizon T, subgaussianity parameter Ο Ni β 0, Β΅iΛ β 0, U CBi β β, βi β [K] for t β [T] do Select It β arg max iβ[K] U CBi Play It and observe reward Xt Update Β΅ΛIt β Β΅ΛIt NIt Xt NIt + 1 , NIt β NIt + 1 Compute U CBIt β Β΅ΛIt + Ο β 4 NIt log T K NIt where log(x) β log(max{1, x}) end for |
| Open Source Code | No | The paper does not contain any explicit statements or links regarding the availability of open-source code for the methodology described. |
| Open Datasets | No | The paper uses a 'bandit made of K = 10 arms' for numerical examples but does not provide concrete access information (link, DOI, repository, formal citation) for a publicly available or open dataset. |
| Dataset Splits | No | The paper's numerical examples are based on a simulated 'bandit made of K = 10 arms' and do not involve explicit training/test/validation dataset splits of a pre-existing dataset. No specific dataset split information is provided. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its numerical examples or experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers, used to implement or run the described algorithms or numerical examples. |
| Experiment Setup | No | While the paper mentions general parameters for numerical examples such as 'K = 10 arms' and 'different time horizons T β {1Β·10^5, 5Β·10^5, 1Β·10^6}', it does not specify concrete hyperparameter values (e.g., the 'a' parameter for UCB1 mentioned in Algorithm 1) or other system-level training configurations used in the experiments. |