reproducibilityindex.ai

Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits

Authors: Siwei Wang, Longbo Huang, John C. S. Lui

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we conduct experiments based on real-world dataset, to compare the Restless-UCB policy with state-of-the-art benchmarks. Our results show that Restless-UCB outperforms existing algorithms in regret, and signiﬁcantly reduces the running time.
Researcher Affiliation	Academia	1Department of Computer Science and Technology, Tsinghua University wangsw2020@mail.tsinghua.edu.cn 2Institute for Interdisciplinary Information Sciences, Tsinghua University longbohuang@mail.tsinghua.edu.cn 3Department of Computer Science and Engineering, The Chinese University of Hong Kong cslui@cse.cuhk.edu.hk
Pseudocode	Yes	Algorithm 1 Restless-UCB Policy
Open Source Code	No	The paper does not include an explicit statement about releasing the source code for the described methodology, nor does it provide a direct link to a code repository.
Open Datasets	Yes	Here we use the wireless communication dataset in [37].
Dataset Splits	No	The paper mentions using 'constructed instances' and a 'wireless communication dataset in [37]', but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split citations).
Hardware Specification	Yes	The results in Table 1 take average over 50 runs of a single-threaded program (running on an Intel E5-2660 v3 workstation with 256GB RAM).
Software Dependencies	No	The paper mentions 'a single-threaded program' but does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch x.x, or specific solver versions).
Experiment Setup	Yes	In all these experiments, we use the ofﬂine policy proposed by [28] as the ofﬂine oracle of restless bandit problems. We consider two problem instances, each instance contains two arms and each arm evolves according to a two-state Markov chain. In both instances, r(i, 2) = 0 for any arm i, and all Markov chains start at state 2. In problem instance one, r(1, 1) = 1, r(2, 1) = 0.8, P1(1, 1) = 0.7, P1(2, 2) = 0.8, P2(1, 1) = 0.5 and P2(2, 2) = 0.6. We use the transition probability matrices in Tables IV and VI in [37]. We also use the average direct signal mean given in Tables III and V in [37] as the expected reward.