Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits
Authors: Siwei Wang, Longbo Huang, John C. S. Lui
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct experiments based on real-world dataset, to compare the Restless-UCB policy with state-of-the-art benchmarks. Our results show that Restless-UCB outperforms existing algorithms in regret, and significantly reduces the running time. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Technology, Tsinghua University EMAIL 2Institute for Interdisciplinary Information Sciences, Tsinghua University EMAIL 3Department of Computer Science and Engineering, The Chinese University of Hong Kong EMAIL |
| Pseudocode | Yes | Algorithm 1 Restless-UCB Policy |
| Open Source Code | No | The paper does not include an explicit statement about releasing the source code for the described methodology, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | Here we use the wireless communication dataset in [37]. |
| Dataset Splits | No | The paper mentions using 'constructed instances' and a 'wireless communication dataset in [37]', but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split citations). |
| Hardware Specification | Yes | The results in Table 1 take average over 50 runs of a single-threaded program (running on an Intel E5-2660 v3 workstation with 256GB RAM). |
| Software Dependencies | No | The paper mentions 'a single-threaded program' but does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch x.x, or specific solver versions). |
| Experiment Setup | Yes | In all these experiments, we use the offline policy proposed by [28] as the offline oracle of restless bandit problems. We consider two problem instances, each instance contains two arms and each arm evolves according to a two-state Markov chain. In both instances, r(i, 2) = 0 for any arm i, and all Markov chains start at state 2. In problem instance one, r(1, 1) = 1, r(2, 1) = 0.8, P1(1, 1) = 0.7, P1(2, 2) = 0.8, P2(1, 1) = 0.5 and P2(2, 2) = 0.6. We use the transition probability matrices in Tables IV and VI in [37]. We also use the average direct signal mean given in Tables III and V in [37] as the expected reward. |