Global Rewards in Restless Multi-Armed Bandits

Authors: Naveen Raman, Zheyuan Shi, Fei Fang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we demonstrate that our proposed policies outperform baselines and index-based policies with synthetic data and real-world data from food rescue.
Researcher Affiliation Academia Naveen Raman Carnegie Mellon University naveenr@cmu.edu Zheyuan Ryan Shi University of Pittsburgh ryanshi@pitt.edu Fei Fang Carnegie Mellon University feifang@cmu.edu
Pseudocode Yes We additionally provide the pseudo-code for our iterative algorithms (Algorithm 2) and our MCTS Shapley-Whittle and MCTS Linear-Whittle algorithms (Algorithm 1).
Open Source Code Yes All code is available here https://github.com/naveenr414/food-rescue-rmab
Open Datasets No We evaluate our policies across both synthetic and real-world datasets. [...] We leverage data from a partnering multi-city food rescue organization and construct an RMAB-G instance using their data (details in Appendix D).
Dataset Splits No The paper mentions training a DQN for a certain number of epochs and evaluating it, but does not specify a distinct validation split for the datasets used in the main RMAB-G policy evaluation.
Hardware Specification Yes We run all experiments on a TITAN Xp with 12 GB of GPU RAM running on Ubuntu 20.04 with 64 GB of RAM.
Software Dependencies No We develop DQN policies using Py Torch [35].
Experiment Setup Yes For all experiments γ = 0.9. We train the DQN for 100 epochs and evaluate with other choices in Appendix F. We let the learning rate be 5 * 10^-4, a batch size of 16, and an Adam optimizer. We use an MLP with a 128-dimension as our hidden layer, and use 2 such hidden layers. We use Re Lu activation for all layers.