Learning Infinite-Horizon Average-Reward Restless Multi-Action Bandits via Index Awareness
Authors: GUOJUN XIONG, Shufan Wang, Jian Li
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present some of our experimental results. We also demonstrate the utility of our GM-R2MAB and UC-R2MAB by evaluating them under two real-world applications of restless bandits. 5.1 Experiments on Constructed Instance 5.2 Experiments on Real-World Datasets |
| Researcher Affiliation | Academia | Guojun Xiong, Shufan Wang, Jian Li SUNY-Binghamton University {gxiong1,swang214,lij}@binghamton.edu |
| Pseudocode | Yes | Algorithm 1 GM-R2MAB; Algorithm 2 UC-R2MAB |
| Open Source Code | No | The ethics statement mentions that code was included for reproducibility ('Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]'), but this does not constitute an explicit public release statement or a direct link to an open-source repository in the main body of the paper. |
| Open Datasets | Yes | We demonstrate the utility of GM-R2MAB and UC-R2MAB by evaluating them under two recently studied applications of restless bandits: wireless scheduling with two actions, and tuberculosis care with multiple actions... We adopt the settings in [40]... leveraged a public dataset of the TB care in India [40] |
| Dataset Splits | No | The paper uses real-world datasets but does not specify explicit training, validation, or test splits (e.g., percentages or sample counts) for reproducibility of data partitioning. |
| Hardware Specification | Yes | We use the Monte Carlo simulation with 1, 000 independent trials of a single-threaded program on AMD Ryzen 5800x desktop with 64GB RAM. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers (e.g., Python, PyTorch, etc.) that would be needed to replicate the experiment environment. |
| Experiment Setup | Yes | For simplicity, we choose 200 arms and a time horizon of T = 60, 000 slots. Each episode consists of 2, 500 slots. |