Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Normal Bandits of Unknown Means and Variances

Authors: Wesley Cowan, Junya Honda, Michael N. Katehakis

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Remark 3. Numerical Regret Comparison: Figure 1 shows the results of a small simulation study done on a set of six populations with means and variances given in Table 1. It provides plots of the regrets when implementing policies πCHK (the index policy of Eq. (13)), πACF (the index policy of Eq. (3)) , and πG a greedy policy that always activates the bandit with the current highest average. Each policy was implemented over a horizon of 100,000 activations, each replicated 10,000 times to produce a good estimate of the average regret Rπ(n) over the times indicated.
Researcher Affiliation Academia Wesley Cowan EMAIL Department of Mathematics Rutgers University ... Junya Honda EMAIL Department of Complexity Science and Engineering Graduate School of Frontier Sciences, The University of Tokyo ... Michael N. Katehakis EMAIL Department of Management Science and Information Systems Rutgers University
Pseudocode Yes Policy πACF (UCB1-NORMAL). At each n = 1,2,...: i) Sample from any bandit i for which T i πACF(n) < 8lnn . ii) If T i πACF(n) > 8lnn , for all i = 1,...,N, sample from bandit πACF(n+1) with πACF(n+1) = arg maxi Xi T iπ(n) +4 Si(T i π(n)) lnn T iπ(n)
Open Source Code No The paper does not contain any explicit statement about making the source code available, nor does it provide a link to a code repository.
Open Datasets No Remark 3. Numerical Regret Comparison: Figure 1 shows the results of a small simulation study done on a set of six populations with means and variances given in Table 1. It provides plots of the regrets when implementing policies πCHK... Each policy was implemented over a horizon of 100,000 activations, each replicated 10,000 times to produce a good estimate of the average regret Rπ(n) over the times indicated.
Dataset Splits No The paper uses simulated data generated based on specified normal distributions, not a pre-existing dataset that would require explicit training/test/validation splits.
Hardware Specification No The paper does not provide specific details about the hardware used for running the simulations, only describing the simulation methodology itself.
Software Dependencies No The paper does not mention any specific software or library names with version numbers that would be needed to replicate the experiments.
Experiment Setup Yes Each policy was implemented over a horizon of 100,000 activations, each replicated 10,000 times to produce a good estimate of the average regret Rπ(n) over the times indicated. The simulation study was done on a set of six populations with means and variances given in Table 1.