Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory

Authors: Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan10085-10092

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically compare UCB-M and its variations with the algorithm UCBCONSTANTSPACE (Liau et al. 2018). We run our set of experiments on three different kinds of K-armed Bernoulli instances earlier used by Jamieson and Nowak (2014): BK L , BK 0.3, and BK 0.6.
Researcher Affiliation Academia Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan 1Department of Computer Science and Engineering Indian Institute of Technology Bombay, Mumbai 400076, India {arghya, shivaram}@cse.iitb.ac.in
Pseudocode Yes Algorithm 1: UCB-M: For finite bandit instances. Algorithm 2: QUCB-M
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the methodology.
Open Datasets Yes We run our set of experiments on three different kinds of K-armed Bernoulli instances earlier used by Jamieson and Nowak (2014): BK L , BK 0.3, and BK 0.6. We use the same four Bernoulli instances used by Roy Chaudhuri and Kalyanakrishnan (2018) instances I1 and I2 have μ = 1, and the probability distributions on μ induced by PA are given by β(0.5, 2), and β(1, 1), respectively.
Dataset Splits No The paper discusses experimental runs and measuring regret over a horizon, but it does not specify explicit training, validation, or test dataset splits in the conventional sense for model training and evaluation.
Hardware Specification No The paper does not provide specific details regarding the hardware used for running the experiments.
Software Dependencies No The paper mentions algorithms like UCB1, Thompson Sampling, and MOSS but does not specify any software dependencies with version numbers used for implementation or experimentation.
Experiment Setup Yes For the first phase, for each of the sub-phases, UCB-M chooses a horizon of b1 = M(M + 2) pulls. (Algorithm 1, line 4) Set α = 0.205, B = M 2(M + 2) 1 1 α , and K0 = . (Algorithm 2, line 1) We present the regret incurred by UCB-M for η = 0.2, alongside the other algorithms. (Section 4.2)