reproducibilityindex.ai

Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets

Authors: Zongqi Wan, Zhijie Zhang, Tongyang Li, Jialin Zhang, Xiaoming Sun

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments to demonstrate the performance of our two quantum variants of bandit algorithms. For simplicity, we use the Bernoulli rewards in both bandit settings.
Researcher Affiliation	Academia	Institute of Computing Technology, Chinese Academy of Sciences 2University of Chinese Academy of Sciences 3Center for Applied Mathematics of Fujian Province, School of Mathematics and Statistics, Fuzhou University 4Center on Frontiers of Computing Studies, Peking University 5School of Computer Science, Peking University
Pseudocode	Yes	Algorithm 1: QUCB1(δ) Parameters: fail probability δ 1: for i = 1 n do 2: ri 1 and Ni C1 δ 3: play arm i for consecutive Ni rounds 4: run QMC1(Oi, ri, δ) to get an estimation ˆµ(i) for µ(i) 5: end for 6: for each stage s = 1, 2, . . . (terminate when we have used T queries to all Oi) do 7: Let is argmaxi ˆµ(i) + ri (if argmax has multiple choices, pick an arbitrary one) 8: update ris ris/2 and Nis C1 δ 9: Play is for the next Nis rounds, update ˆµ(is) by runing QMC1(Ois, ris, δ) 10: end for
Open Source Code	No	The paper mentions using
Open Datasets	No	The paper describes setting up instances for experiments (e.g.,
Dataset Splits	No	The paper does not specify any training, validation, or test dataset splits. It discusses
Hardware Specification	Yes	Our experiments are executed on a computer equipped with Xeon E5-2620 CPU and 64GB memory.
Software Dependencies	No	The paper mentions
Experiment Setup	Yes	For the QMAB setting, we run UCB and QUCB on a 2-arm bandit for T = 10^6 rounds... we set the reward gap of our experimental instance relatively small. Overall, we set the mean reward of the optimal arm to be 0.5, then we let the reward gaps of the sub-optimal arm be 0.01.