Global Rewards in Restless Multi-Armed Bandits
Authors: Naveen Raman, Zheyuan Shi, Fei Fang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate that our proposed policies outperform baselines and index-based policies with synthetic data and real-world data from food rescue. |
| Researcher Affiliation | Academia | Naveen Raman Carnegie Mellon University naveenr@cmu.edu Zheyuan Ryan Shi University of Pittsburgh ryanshi@pitt.edu Fei Fang Carnegie Mellon University feifang@cmu.edu |
| Pseudocode | Yes | We additionally provide the pseudo-code for our iterative algorithms (Algorithm 2) and our MCTS Shapley-Whittle and MCTS Linear-Whittle algorithms (Algorithm 1). |
| Open Source Code | Yes | All code is available here https://github.com/naveenr414/food-rescue-rmab |
| Open Datasets | No | We evaluate our policies across both synthetic and real-world datasets. [...] We leverage data from a partnering multi-city food rescue organization and construct an RMAB-G instance using their data (details in Appendix D). |
| Dataset Splits | No | The paper mentions training a DQN for a certain number of epochs and evaluating it, but does not specify a distinct validation split for the datasets used in the main RMAB-G policy evaluation. |
| Hardware Specification | Yes | We run all experiments on a TITAN Xp with 12 GB of GPU RAM running on Ubuntu 20.04 with 64 GB of RAM. |
| Software Dependencies | No | We develop DQN policies using Py Torch [35]. |
| Experiment Setup | Yes | For all experiments γ = 0.9. We train the DQN for 100 epochs and evaluate with other choices in Appendix F. We let the learning rate be 5 * 10^-4, a batch size of 16, and an Adam optimizer. We use an MLP with a 128-dimension as our hidden layer, and use 2 such hidden layers. We use Re Lu activation for all layers. |