Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory
Authors: Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan10085-10092
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically compare UCB-M and its variations with the algorithm UCBCONSTANTSPACE (Liau et al. 2018). We run our set of experiments on three different kinds of K-armed Bernoulli instances earlier used by Jamieson and Nowak (2014): BK L , BK 0.3, and BK 0.6. |
| Researcher Affiliation | Academia | Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan 1Department of Computer Science and Engineering Indian Institute of Technology Bombay, Mumbai 400076, India {arghya, shivaram}@cse.iitb.ac.in |
| Pseudocode | Yes | Algorithm 1: UCB-M: For finite bandit instances. Algorithm 2: QUCB-M |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the methodology. |
| Open Datasets | Yes | We run our set of experiments on three different kinds of K-armed Bernoulli instances earlier used by Jamieson and Nowak (2014): BK L , BK 0.3, and BK 0.6. We use the same four Bernoulli instances used by Roy Chaudhuri and Kalyanakrishnan (2018) instances I1 and I2 have μ = 1, and the probability distributions on μ induced by PA are given by β(0.5, 2), and β(1, 1), respectively. |
| Dataset Splits | No | The paper discusses experimental runs and measuring regret over a horizon, but it does not specify explicit training, validation, or test dataset splits in the conventional sense for model training and evaluation. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions algorithms like UCB1, Thompson Sampling, and MOSS but does not specify any software dependencies with version numbers used for implementation or experimentation. |
| Experiment Setup | Yes | For the first phase, for each of the sub-phases, UCB-M chooses a horizon of b1 = M(M + 2) pulls. (Algorithm 1, line 4) Set α = 0.205, B = M 2(M + 2) 1 1 α , and K0 = . (Algorithm 2, line 1) We present the regret incurred by UCB-M for η = 0.2, alongside the other algorithms. (Section 4.2) |