PAC Identification of a Bandit Arm Relative to a Reward Quantile
Authors: Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present sampling algorithms for both the finiteand infinite-armed cases, and validate their efficiency through theoretical and experimental analysis.We evaluate F2 by comparing it with (1) LUCB (Kalyanakrishnan et al. 2012), which can be applied with any setting of the subset size from 1 to m, and with (2) uniform sampling. All the algorithms terminate based on F2 s stopping rule. We simulate rewards based on Bernoulli distributions. |
| Researcher Affiliation | Academia | Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan Department of Computer Science and Engineering Indian Institute of Technology Bombay, Mumbai 400076, India {arghya, shivaram}@cse.iitb.ac.in |
| Pseudocode | Yes | Algorithm 1: F2 and Algorithm 2: P2 |
| Open Source Code | No | The paper does not contain an explicit statement that the source code for the described methodology is publicly available, nor does it provide any links to a code repository. |
| Open Datasets | No | The paper mentions "three 10-armed bandit instances, which are described in Table 1." but these are problem setups with specified mean rewards, not publicly accessible datasets with concrete access information (link, DOI, repository, or citation to an external resource). |
| Dataset Splits | No | The paper does not explicitly provide details about training/validation/test dataset splits. It describes bandit instances and simulation parameters but not data partitioning for validation. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory amounts) used to run its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., libraries, frameworks, or solvers) used to replicate the experiment. |
| Experiment Setup | Yes | We set m = 4, ϵ = 0 and δ = 0.001. and For all algorithms we use KL-divergence based confidence bounds (...) which, in our experiments, give at least a 10-fold improvement over the Hoeffding-based bounds used in Section 4 for the purposes of illustration. We simulate rewards based on Bernoulli distributions. and The y axis of each graph shows the sample complexity (averaged over 100 independent runs, with error bars corresponding to one standard error of the mean). |