reproducibilityindex.ai

Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory

Authors: Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan10085-10092

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically compare UCB-M and its variations with the algorithm UCBCONSTANTSPACE (Liau et al. 2018). We run our set of experiments on three different kinds of K-armed Bernoulli instances earlier used by Jamieson and Nowak (2014): BK L , BK 0.3, and BK 0.6.
Researcher Affiliation	Academia	Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan 1Department of Computer Science and Engineering Indian Institute of Technology Bombay, Mumbai 400076, India {arghya, shivaram}@cse.iitb.ac.in
Pseudocode	Yes	Algorithm 1: UCB-M: For ﬁnite bandit instances. Algorithm 2: QUCB-M
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the methodology.
Open Datasets	Yes	We run our set of experiments on three different kinds of K-armed Bernoulli instances earlier used by Jamieson and Nowak (2014): BK L , BK 0.3, and BK 0.6. We use the same four Bernoulli instances used by Roy Chaudhuri and Kalyanakrishnan (2018) instances I1 and I2 have μ = 1, and the probability distributions on μ induced by PA are given by β(0.5, 2), and β(1, 1), respectively.
Dataset Splits	No	The paper discusses experimental runs and measuring regret over a horizon, but it does not specify explicit training, validation, or test dataset splits in the conventional sense for model training and evaluation.
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for running the experiments.
Software Dependencies	No	The paper mentions algorithms like UCB1, Thompson Sampling, and MOSS but does not specify any software dependencies with version numbers used for implementation or experimentation.
Experiment Setup	Yes	For the ﬁrst phase, for each of the sub-phases, UCB-M chooses a horizon of b1 = M(M + 2) pulls. (Algorithm 1, line 4) Set α = 0.205, B = M 2(M + 2) 1 1 α , and K0 = . (Algorithm 2, line 1) We present the regret incurred by UCB-M for η = 0.2, alongside the other algorithms. (Section 4.2)