reproducibilityindex.ai

Towards a Pretrained Model for Restless Bandits via Multi-arm Generalization

Authors: Yunfan Zhao, Nikhil Behari, Edward Hughes, Edwin Zhang, Dheeraj Nagaraj, Karl Tuyls, Aparna Taneja, Milind Tambe

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We theoretically prove the benefits of multi-arm generalization and empirically demonstrate the advantages of our approach on several challenging, real-world inspired problems. We provide experimental evaluations of our model in three separate domains, including a synthetic setting, an epidemic modeling setting, as well as a maternal healthcare intervention setting. In Appendix B, we provide ablation studies over (1) a wider range of opt-in rates (2) different feature mappings (3) DDLPO topline with and without features (4) more problem settings.
Researcher Affiliation	Collaboration	Yunfan Zhao 1, Nikhil Behari 1 , Edward Hughes2 , Edwin Zhang1 , Dheeraj Nagaraj2 , Karl Tuyls2 , Aparna Taneja2 and Milind Tambe1,2 1Harvard University 2Google
Pseudocode	Yes	Algorithm 1 Pre Fe RMAB (Training), Algorithm 2 State Shaping Subroutine, Algorithm 3 Pre Fe RMAB (Inference)
Open Source Code	Yes	Code is available at https://github.com/yzhao3685/Pre Fe RMAB
Open Datasets	Yes	Following [Killian et al., 2022], we consider a synthetic dataset with binary states and binary actions. Inspired by the vast literature on agent-based epidemic modeling, we adapt the SIS model given in [Yaesoubi and Cohen, 2011], following a similar experiment setup as described in [Killian et al., 2022]. Similar to the set up in [Biswas et al., 2021; Killian et al., 2022], we model the real world maternal health problem as a discrete state RMAB.
Dataset Splits	No	The paper discusses training and testing but does not provide explicit details about a validation dataset split (e.g., percentages, sample counts, or specific methodology).
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using the PPO algorithm and the Ray RLlib library, but it does not specify any version numbers for these or other software dependencies.
Experiment Setup	Yes	In Appendix A, we provide additional details, including hyperparameters and State Shaping illustration in Appendix A. All experiments use the PPO algorithm [Schulman et al., 2017] implemented with the Ray RLlib library. We set the discount factor to β = 0.99. Unless specified, we set the number of arms N = 21, budget B = 7 for Synthetic experiments; N = 20, B = 16 for SIS experiments; N = 25, B = 7 for ARMMAN experiments. The batch size is 512, with a learning rate of 0.0001.