Learning Infinite-Horizon Average-Reward Restless Multi-Action Bandits via Index Awareness

Authors: GUOJUN XIONG, Shufan Wang, Jian Li

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present some of our experimental results. We also demonstrate the utility of our GM-R2MAB and UC-R2MAB by evaluating them under two real-world applications of restless bandits. 5.1 Experiments on Constructed Instance 5.2 Experiments on Real-World Datasets
Researcher Affiliation Academia Guojun Xiong, Shufan Wang, Jian Li SUNY-Binghamton University {gxiong1,swang214,lij}@binghamton.edu
Pseudocode Yes Algorithm 1 GM-R2MAB; Algorithm 2 UC-R2MAB
Open Source Code No The ethics statement mentions that code was included for reproducibility ('Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]'), but this does not constitute an explicit public release statement or a direct link to an open-source repository in the main body of the paper.
Open Datasets Yes We demonstrate the utility of GM-R2MAB and UC-R2MAB by evaluating them under two recently studied applications of restless bandits: wireless scheduling with two actions, and tuberculosis care with multiple actions... We adopt the settings in [40]... leveraged a public dataset of the TB care in India [40]
Dataset Splits No The paper uses real-world datasets but does not specify explicit training, validation, or test splits (e.g., percentages or sample counts) for reproducibility of data partitioning.
Hardware Specification Yes We use the Monte Carlo simulation with 1, 000 independent trials of a single-threaded program on AMD Ryzen 5800x desktop with 64GB RAM.
Software Dependencies No The paper does not provide specific software names with version numbers (e.g., Python, PyTorch, etc.) that would be needed to replicate the experiment environment.
Experiment Setup Yes For simplicity, we choose 200 arms and a time horizon of T = 60, 000 slots. Each episode consists of 2, 500 slots.