Tight Policy Regret Bounds for Improving and Decaying Bandits

Authors: Hoda Heidari, Michael Kearns, Aaron Roth

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we empirically investigate the performance of A1 on several illustrative reward curves, and observe that (T) (and the regret of our algorithm) are typically significantly sublinear in T. Throughout, for simplicity we set n to 2 and consider two arms 1, 2 where the asymptote of f1 is 1 and that of f2 is 0.5. In Figure 2, we report the value of regret and versus T = 500, ..., 30000 for three different sets of examples.
Researcher Affiliation Academia Hoda Heidari University of Pennsylvania hoda@seas.upenn.edu Michael Kearns University of Pennsylvania mkearns@cis.upenn.edu Aaron Roth University of Pennsylvania aaroth@cis.upenn.edu
Pseudocode Yes Algorithm 1 The online algorithm for concave and increasing reward functions (A1) and Algorithm 2 A no policy regret algorithm for decreasing reward functions (A2)
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper conducts simulations using mathematically defined reward functions, such as 'f1(t) = 1 - t^-0.5 and f2(t) = 0.5 - 0.5t^-beta'. It does not refer to or provide access information for any publicly available or open dataset.
Dataset Splits No The paper does not provide specific dataset split information (like train/validation/test percentages or counts) needed to reproduce data partitioning. The experiments are simulations based on defined mathematical functions.
Hardware Specification No The paper does not provide specific hardware details (like GPU/CPU models, processor types, or memory amounts) used for running its simulations.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiments.
Experiment Setup No The paper describes the mathematical functions used for simulations and the range of T values, but does not provide specific hyperparameter values or system-level training settings as would typically be found in an experimental setup description for machine learning models.