Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Global Rewards in Restless Multi-Armed Bandits

Authors: Naveen Raman, Zheyuan Shi, Fei Fang

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we demonstrate that our proposed policies outperform baselines and index-based policies with synthetic data and real-world data from food rescue.
Researcher Affiliation Academia Naveen Raman Carnegie Mellon University EMAIL Zheyuan Ryan Shi University of Pittsburgh EMAIL Fei Fang Carnegie Mellon University EMAIL
Pseudocode Yes We additionally provide the pseudo-code for our iterative algorithms (Algorithm 2) and our MCTS Shapley-Whittle and MCTS Linear-Whittle algorithms (Algorithm 1).
Open Source Code Yes All code is available here https://github.com/naveenr414/food-rescue-rmab
Open Datasets No We evaluate our policies across both synthetic and real-world datasets. [...] We leverage data from a partnering multi-city food rescue organization and construct an RMAB-G instance using their data (details in Appendix D).
Dataset Splits No The paper mentions training a DQN for a certain number of epochs and evaluating it, but does not specify a distinct validation split for the datasets used in the main RMAB-G policy evaluation.
Hardware Specification Yes We run all experiments on a TITAN Xp with 12 GB of GPU RAM running on Ubuntu 20.04 with 64 GB of RAM.
Software Dependencies No We develop DQN policies using Py Torch [35].
Experiment Setup Yes For all experiments γ = 0.9. We train the DQN for 100 epochs and evaluate with other choices in Appendix F. We let the learning rate be 5 * 10^-4, a batch size of 16, and an Adam optimizer. We use an MLP with a 128-dimension as our hidden layer, and use 2 such hidden layers. We use Re Lu activation for all layers.