Rotting Infinitely Many-Armed Bandits

Authors: Jung-Hun Kim, Milan Vojnovic, Se-Young Yun

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present results of numerical experiments for randomly generated problem instances of rotting infinitely manyarmed bandits. These results validate the insights derived from our theoretical results.
Researcher Affiliation Academia 1Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea 2London School of Economics, London, UK. Correspondence to: Se-Young Yun <yunseyoung@kaist.ac.kr>, Milan Vojnovi c <m.vojnovic@lse.ac.uk>.
Pseudocode Yes Algorithm 1 UCB-Threshold Policy (UCB-TP); Algorithm 2 Adaptive UCB-Threshold Policy (AUCB-TP)
Open Source Code Yes Our code is available at https://github.com/ junghunkim7786/rotting_infinite_armed_ bandits
Open Datasets No The paper states 'We generate initial mean rewards of arms by sampling from uniform distribution on [0, 1].' This indicates the use of synthetic data generation rather than a specific, publicly available dataset with concrete access information.
Dataset Splits No The paper uses synthetic data and does not mention explicit training, validation, or test dataset splits. The experiments are conducted over a 'time horizon T' rather than on pre-partitioned static datasets.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) that would be needed for replication.
Experiment Setup Yes We generate initial mean rewards of arms by sampling from uniform distribution on [0, 1]. In each time step, stochastic reward from pulling an arm has a Gaussian noise with mean zero and variance 1. We repeat each experiment 10 times and compute confidence intervals for confidence probability 0.95.