Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Tight Policy Regret Bounds for Improving and Decaying Bandits

Authors: Hoda Heidari, Michael Kearns, Aaron Roth

IJCAI 2016 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we empirically investigate the performance of A1 on several illustrative reward curves, and observe that (T) (and the regret of our algorithm) are typically significantly sublinear in T. Throughout, for simplicity we set n to 2 and consider two arms 1, 2 where the asymptote of f1 is 1 and that of f2 is 0.5. In Figure 2, we report the value of regret and versus T = 500, ..., 30000 for three different sets of examples.
Researcher Affiliation Academia Hoda Heidari University of Pennsylvania EMAIL Michael Kearns University of Pennsylvania EMAIL Aaron Roth University of Pennsylvania EMAIL
Pseudocode Yes Algorithm 1 The online algorithm for concave and increasing reward functions (A1) and Algorithm 2 A no policy regret algorithm for decreasing reward functions (A2)
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper conducts simulations using mathematically defined reward functions, such as 'f1(t) = 1 - t^-0.5 and f2(t) = 0.5 - 0.5t^-beta'. It does not refer to or provide access information for any publicly available or open dataset.
Dataset Splits No The paper does not provide specific dataset split information (like train/validation/test percentages or counts) needed to reproduce data partitioning. The experiments are simulations based on defined mathematical functions.
Hardware Specification No The paper does not provide specific hardware details (like GPU/CPU models, processor types, or memory amounts) used for running its simulations.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiments.
Experiment Setup No The paper describes the mathematical functions used for simulations and the range of T values, but does not provide specific hyperparameter values or system-level training settings as would typically be found in an experimental setup description for machine learning models.