Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Infinite-Horizon Average-Reward Restless Multi-Action Bandits via Index Awareness

Authors: GUOJUN XIONG, Shufan Wang, Jian Li

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present some of our experimental results. We also demonstrate the utility of our GM-R2MAB and UC-R2MAB by evaluating them under two real-world applications of restless bandits. 5.1 Experiments on Constructed Instance 5.2 Experiments on Real-World Datasets
Researcher Affiliation Academia Guojun Xiong, Shufan Wang, Jian Li SUNY-Binghamton University EMAIL
Pseudocode Yes Algorithm 1 GM-R2MAB; Algorithm 2 UC-R2MAB
Open Source Code No The ethics statement mentions that code was included for reproducibility ('Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]'), but this does not constitute an explicit public release statement or a direct link to an open-source repository in the main body of the paper.
Open Datasets Yes We demonstrate the utility of GM-R2MAB and UC-R2MAB by evaluating them under two recently studied applications of restless bandits: wireless scheduling with two actions, and tuberculosis care with multiple actions... We adopt the settings in [40]... leveraged a public dataset of the TB care in India [40]
Dataset Splits No The paper uses real-world datasets but does not specify explicit training, validation, or test splits (e.g., percentages or sample counts) for reproducibility of data partitioning.
Hardware Specification Yes We use the Monte Carlo simulation with 1, 000 independent trials of a single-threaded program on AMD Ryzen 5800x desktop with 64GB RAM.
Software Dependencies No The paper does not provide specific software names with version numbers (e.g., Python, PyTorch, etc.) that would be needed to replicate the experiment environment.
Experiment Setup Yes For simplicity, we choose 200 arms and a time horizon of T = 60, 000 slots. Each episode consists of 2, 500 slots.