Optimistic Whittle Index Policy: Online Learning for Restless Bandits

Authors: Kai Wang, Lily Xu, Aparna Taneja, Milind Tambe

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we demonstrate that UCWhittle leverages the structure of RMABs and the Whittle index policy solution to achieve better performance than existing online learning baselines across three domains, including one constructed from a real-world maternal and childcare dataset.
Researcher Affiliation Collaboration Kai Wang* 1, Lily Xu* 1, Aparna Taneja2, Milind Tambe1,2 1Harvard University 2Google Research {kaiwang, lily xu}@g.harvard.edu, {aparnataneja, milindtambe}@google.com
Pseudocode Yes Algorithm 1: UCWhittle
Open Source Code Yes Code available at https://github.com/lily-x/online-rmab
Open Datasets Yes We use real, anonymized data of the engagement behavior of 7,656 mothers from a previous RMAB field study (Mate et al. 2022b).
Dataset Splits No The paper describes episodic interaction with an RMAB instance and averaging results over random seeds, but does not specify traditional train/validation/test dataset splits.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or specific machine specifications used for experiments.
Software Dependencies No The paper does not specify software dependencies with version numbers.
Experiment Setup Yes The per-episode reward is the cumulative discounted reward with discount rate γ = 0.9. We then compute regret by subtracting the reward earned by each algorithm from the reward of the optimal policy. Results are averaged over 30 random seeds and smoothed using exponential smoothing with a weight of 0.9.