Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Global Rewards in Restless Multi-Armed Bandits
Authors: Naveen Raman, Zheyuan Shi, Fei Fang
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate that our proposed policies outperform baselines and index-based policies with synthetic data and real-world data from food rescue. |
| Researcher Affiliation | Academia | Naveen Raman Carnegie Mellon University EMAIL Zheyuan Ryan Shi University of Pittsburgh EMAIL Fei Fang Carnegie Mellon University EMAIL |
| Pseudocode | Yes | We additionally provide the pseudo-code for our iterative algorithms (Algorithm 2) and our MCTS Shapley-Whittle and MCTS Linear-Whittle algorithms (Algorithm 1). |
| Open Source Code | Yes | All code is available here https://github.com/naveenr414/food-rescue-rmab |
| Open Datasets | No | We evaluate our policies across both synthetic and real-world datasets. [...] We leverage data from a partnering multi-city food rescue organization and construct an RMAB-G instance using their data (details in Appendix D). |
| Dataset Splits | No | The paper mentions training a DQN for a certain number of epochs and evaluating it, but does not specify a distinct validation split for the datasets used in the main RMAB-G policy evaluation. |
| Hardware Specification | Yes | We run all experiments on a TITAN Xp with 12 GB of GPU RAM running on Ubuntu 20.04 with 64 GB of RAM. |
| Software Dependencies | No | We develop DQN policies using Py Torch [35]. |
| Experiment Setup | Yes | For all experiments γ = 0.9. We train the DQN for 100 epochs and evaluate with other choices in Appendix F. We let the learning rate be 5 * 10^-4, a batch size of 16, and an Adam optimizer. We use an MLP with a 128-dimension as our hidden layer, and use 2 such hidden layers. We use Re Lu activation for all layers. |