Multi-armed Bandit Requiring Monotone Arm Sequences

Authors: Ningyuan Chen

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct numerical studies to compare the regret of Algorithm 1 with a few benchmarks. The experiments are conducted on a Windows 10 desktop with Intel i9-10900K CPU.The average regret over the 100 instances is shown in Figure 3.
Researcher Affiliation Academia Ningyuan Chen Rotman School of Management, University of Toronto 105 St George St, Toronto, ON, Canada ningyuan.chen@utoronto.ca
Pseudocode Yes Algorithm 1 Increasing arm sequence
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No For each instance, we generate x1 U(0, 1) and x2 U(0.5, 1), and let f(x) be the linear interpolation of the three points (0, 0), (x1, x2), and (1, 0). The reward Zt N(f(Xt), 0.1) for all instances and T. (The authors generated their own data for the numerical experiments, and no access information is provided for this generated data.)
Dataset Splits No The paper does not specify any training/validation/test dataset splits. It describes generating instances and running simulations for a given horizon T.
Hardware Specification Yes The experiments are conducted on a Windows 10 desktop with Intel i9-10900K CPU.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup Yes By choosing K = T 1/4 and m = T 1/2 , the regret of Algorithm 1 satisfies... and As stated in Theorem 2, we use K = T 1/4 , m = T 1/2 and σ = 0.1.