Towards a Pretrained Model for Restless Bandits via Multi-arm Generalization
Authors: Yunfan Zhao, Nikhil Behari, Edward Hughes, Edwin Zhang, Dheeraj Nagaraj, Karl Tuyls, Aparna Taneja, Milind Tambe
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We theoretically prove the benefits of multi-arm generalization and empirically demonstrate the advantages of our approach on several challenging, real-world inspired problems. We provide experimental evaluations of our model in three separate domains, including a synthetic setting, an epidemic modeling setting, as well as a maternal healthcare intervention setting. In Appendix B, we provide ablation studies over (1) a wider range of opt-in rates (2) different feature mappings (3) DDLPO topline with and without features (4) more problem settings. |
| Researcher Affiliation | Collaboration | Yunfan Zhao 1, Nikhil Behari 1 , Edward Hughes2 , Edwin Zhang1 , Dheeraj Nagaraj2 , Karl Tuyls2 , Aparna Taneja2 and Milind Tambe1,2 1Harvard University 2Google |
| Pseudocode | Yes | Algorithm 1 Pre Fe RMAB (Training), Algorithm 2 State Shaping Subroutine, Algorithm 3 Pre Fe RMAB (Inference) |
| Open Source Code | Yes | Code is available at https://github.com/yzhao3685/Pre Fe RMAB |
| Open Datasets | Yes | Following [Killian et al., 2022], we consider a synthetic dataset with binary states and binary actions. Inspired by the vast literature on agent-based epidemic modeling, we adapt the SIS model given in [Yaesoubi and Cohen, 2011], following a similar experiment setup as described in [Killian et al., 2022]. Similar to the set up in [Biswas et al., 2021; Killian et al., 2022], we model the real world maternal health problem as a discrete state RMAB. |
| Dataset Splits | No | The paper discusses training and testing but does not provide explicit details about a validation dataset split (e.g., percentages, sample counts, or specific methodology). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using the PPO algorithm and the Ray RLlib library, but it does not specify any version numbers for these or other software dependencies. |
| Experiment Setup | Yes | In Appendix A, we provide additional details, including hyperparameters and State Shaping illustration in Appendix A. All experiments use the PPO algorithm [Schulman et al., 2017] implemented with the Ray RLlib library. We set the discount factor to β = 0.99. Unless specified, we set the number of arms N = 21, budget B = 7 for Synthetic experiments; N = 20, B = 16 for SIS experiments; N = 25, B = 7 for ARMMAN experiments. The batch size is 512, with a learning rate of 0.0001. |