Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits
Authors: Siwei Wang, Longbo Huang, John C. S. Lui
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct experiments based on real-world dataset, to compare the Restless-UCB policy with state-of-the-art benchmarks. Our results show that Restless-UCB outperforms existing algorithms in regret, and significantly reduces the running time. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Technology, Tsinghua University wangsw2020@mail.tsinghua.edu.cn 2Institute for Interdisciplinary Information Sciences, Tsinghua University longbohuang@mail.tsinghua.edu.cn 3Department of Computer Science and Engineering, The Chinese University of Hong Kong cslui@cse.cuhk.edu.hk |
| Pseudocode | Yes | Algorithm 1 Restless-UCB Policy |
| Open Source Code | No | The paper does not include an explicit statement about releasing the source code for the described methodology, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | Here we use the wireless communication dataset in [37]. |
| Dataset Splits | No | The paper mentions using 'constructed instances' and a 'wireless communication dataset in [37]', but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split citations). |
| Hardware Specification | Yes | The results in Table 1 take average over 50 runs of a single-threaded program (running on an Intel E5-2660 v3 workstation with 256GB RAM). |
| Software Dependencies | No | The paper mentions 'a single-threaded program' but does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch x.x, or specific solver versions). |
| Experiment Setup | Yes | In all these experiments, we use the offline policy proposed by [28] as the offline oracle of restless bandit problems. We consider two problem instances, each instance contains two arms and each arm evolves according to a two-state Markov chain. In both instances, r(i, 2) = 0 for any arm i, and all Markov chains start at state 2. In problem instance one, r(1, 1) = 1, r(2, 1) = 0.8, P1(1, 1) = 0.7, P1(2, 2) = 0.8, P2(1, 1) = 0.5 and P2(2, 2) = 0.6. We use the transition probability matrices in Tables IV and VI in [37]. We also use the average direct signal mean given in Tables III and V in [37] as the expected reward. |