Collapsing Bandits and Their Application to Public Health Intervention

Authors: Aditya Mate, Jackson Killian, Haifeng Xu, Andrew Perrault, Milind Tambe

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our algorithm on several data distributions including data from a real-world healthcare task in which a worker must monitor and deliver interventions to maximize their patients adherence to tuberculosis medication. Our algorithm achieves a 3-order-of-magnitude speedup compared to state-of-the-art RMAB techniques, while achieving similar performance.
Researcher Affiliation Academia Aditya Mate Harvard University Cambridge, MA, 02138 aditya_mate@g.harvard.edu Jackson A. Killian Harvard University Cambridge, MA, 02138 jkillian@g.harvard.edu Haifeng Xu University of Virginia Charlottesville, VA, 22903 hx4ad@virginia.edu Andrew Perrault Harvard University Cambridge, MA, 02138 aperrault@g.harvard.edu Milind Tambe Harvard University Cambridge, MA, 02138 milind_tambe@harvard.edu
Pseudocode Yes Algorithm 1: Sequential index computation algorithm
Open Source Code Yes The code is available at: https://github.com/Aditya Mate/collapsing_bandits
Open Datasets Yes We first test on tuberculosis medication adherence monitoring data, which contains daily adherence information recorded for each real patient in the system, as obtained from Killian et al. [17].
Dataset Splits No The paper does not explicitly state specific training, validation, or test dataset splits (e.g., percentages or sample counts). It mentions using real-world data and synthetic distributions for evaluation.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU, GPU models, or memory specifications).
Software Dependencies No The paper does not provide specific version numbers for any software dependencies (e.g., libraries, frameworks, or programming languages).
Experiment Setup Yes Reward is measured as the undiscounted sum of patients (arms) in the adherent state over all rounds, where each trial lasts T = 180 days (matching the length of first-line TB treatment) with N patients and a budget of k calls per day. All experiments in this section set all δ to 0.05. ... We set the resource level, k = 10%N in our simulation for Fig. 5a.