On Multi-Armed Bandit with Impatient Arms

Authors: Yuming Shao, Zhixuan Fang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we conduct experiments to validate our theoretical results. We conduct numerical experiments to validate our theoretical results. 5. Numerical Experiments: We examine the theoretical results in 4 simulations.
Researcher Affiliation Academia 1Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China 2Shanghai Qi Zhi Institute, Shanghai, China.
Pseudocode Yes Algorithm 1 FC-SE, Algorithm 2 The Shortest Length AUS Cycle Construction, Algorithm 3 A Dynamic Programming Solution to the Optimization Problem in Algorithm 2, Algorithm 4 FC-Entry
Open Source Code No No explicit statement providing access to the source code for the methodology described in this paper was found. The paper does not include a link to a repository or an affirmative statement about code release.
Open Datasets No No concrete access information (specific link, DOI, repository name, formal citation with authors/year, or reference to established benchmark datasets) for a publicly available or open dataset was provided. The experiments use synthetic data with sampled parameters and standard Gaussian noises.
Dataset Splits No No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning was found. The experiments are based on simulations with i.i.d. random variables rather than pre-split datasets.
Hardware Specification No No specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running experiments were mentioned.
Software Dependencies No No specific ancillary software details, such as library or solver names with version numbers, were provided for replicating the experiment.
Experiment Setup Yes We consider there are K = 5 arms. Let µ1 = 0.7. Other reward means are sampled uniformly from [0, 0.6]... The entries of m are sampled uniformly from 1 + (K 1) 16ϵ 2 ln T , T. We run UCB with δ = 1/T 2... We consider there are K = 5 arms and set m = (3, 5, 12, 155, 1000). We run Algorithm 2 and construct a feasible cycle 1, 2, 3, 1, 2, 4, 1, 2, 5 with n = 9 for all the arms. The entries of µ are sampled uniformly from [0, 1]... We consider there are K = 6 arms and set m = (2, 4, 4, 6800, 6800, 15000). We let N = 3 and use the feasible cycle 1, 2, 1, 3 with n = 4... We consider there are K = 7 arms and set m = (3, 5, 1000, 6667, 12, 10000, 26). Arm 5, 6, 7 are newly entering arms and m = 12. We construct a feasible cycle 1, 2, +, 1, 2, -, 1, 2, 3 with n = 9 for patience vector (m1, m2, m+, m , m3) = (3, 5, 12, 12, 1000), where +, are two virtual arms with patience m.